From: Sarah Hoffmann Date: Mon, 16 Aug 2021 06:48:28 +0000 (+0200) Subject: Merge pull request #2424 from lonvia/multi-country-import X-Git-Tag: v4.0.0~42 X-Git-Url: https://git.openstreetmap.org./nominatim.git/commitdiff_plain/31d95457020dd73af7f92dc1b10c928dacadc56a?hp=e449071a35de8855bb06d2e1d8c4325ddc48ddb7 Merge pull request #2424 from lonvia/multi-country-import Update instructions for importing multiple regions --- diff --git a/docs/admin/Advanced-Installations.md b/docs/admin/Advanced-Installations.md index d5e6e889..ee38f3e4 100644 --- a/docs/admin/Advanced-Installations.md +++ b/docs/admin/Advanced-Installations.md @@ -5,9 +5,34 @@ your Nominatim database. It is assumed that you have already successfully installed the Nominatim software itself, if not return to the [installation page](Installation.md). -## Importing multiple regions +## Importing multiple regions (without updates) -To import multiple regions in your database, you need to configure and run `utils/import_multiple_regions.sh` file. This script will set up the update directory which has the following structure: +To import multiple regions in your database you can simply give multiple +OSM files to the import command: + +``` +nominatim import --osm-file file1.pbf --osm-file file2.pbf +``` + +If you already have imported a file and want to add another one, you can +use the add-data function to import the additional data as follows: + +``` +nominatim add-data --file +nominatim refresh --postcodes +nominatim index -j +``` + +Please note that adding additional data is always significantly slower than +the original import. + +## Importing multiple regions (with updates) + +If you want to import multiple regions _and_ be able to keep them up-to-date +with updates, then you can use the scripts provided in the `utils` directory. + +These scripts will set up an `update` directory in your project directory, +which has the following structure: ```bash update @@ -17,7 +42,6 @@ update    │   └── monaco    │   └── sequence.state    └── tmp - ├── combined.osm.pbf └── europe ├── andorra-latest.osm.pbf └── monaco-latest.osm.pbf @@ -25,85 +49,57 @@ update ``` -The `sequence.state` files will contain the sequence ID, which will be used by pyosmium to get updates. The tmp folder is used for import dump. - -### Configuring multiple regions - -The file `import_multiple_regions.sh` needs to be edited as per your requirement: - -1. List of countries. eg: - - COUNTRIES="europe/monaco europe/andorra" - -2. Path to Build directory. eg: +The `sequence.state` files contain the sequence ID for each region. They will +be used by pyosmium to get updates. The `tmp` folder is used for import dump and +can be deleted once the import is complete. - NOMINATIMBUILD="/srv/nominatim/build" - -3. Path to Update directory. eg: - - UPDATEDIR="/srv/nominatim/update" - -4. Replication URL. eg: - - BASEURL="https://download.geofabrik.de" - DOWNCOUNTRYPOSTFIX="-latest.osm.pbf" ### Setting up multiple regions -!!! tip - If your database already exists and you want to add more countries, - replace the setting up part - `${SETUPFILE} --osm-file ${UPDATEDIR}/tmp/combined.osm.pbf --all 2>&1` - with `${UPDATEFILE} --import-file ${UPDATEDIR}/tmp/combined.osm.pbf --index --index-instances N 2>&1` - where N is the numbers of CPUs in your system. - -Run the following command from your Nominatim directory after configuring the file. - - bash ./utils/import_multiple_regions.sh - -!!! danger "Important" - This file uses osmium-tool. It must be installed before executing the import script. - Installation instructions can be found [here](https://osmcode.org/osmium-tool/manual.html#installation). - -### Updating multiple regions - -To import multiple regions in your database, you need to configure and run ```utils/update_database.sh```. -This uses the update directory set up while setting up the DB. +Create a project directory as described for the +[simple import](Import.md#creating-the-project-directory). If necessary, +you can also add an `.env` configuration with customized options. In particular, +you need to make sure that `NOMINATIM_REPLICATION_UPDATE_INTERVAL` and +`NOMINATIM_REPLICATION_RECHECK_INTERVAL` are set according to the update +interval of the extract server you use. -### Configuring multiple regions +Copy the scripts `utils/import_multiple_regions.sh` and `utils/update_database.sh` +into the project directory. -The file `update_database.sh` needs to be edited as per your requirement: +Now customize both files as per your requirements -1. List of countries. eg: +1. List of countries. e.g. COUNTRIES="europe/monaco europe/andorra" -2. Path to Build directory. eg: +2. URL to the service providing the extracts and updates. eg: - NOMINATIMBUILD="/srv/nominatim/build" - -3. Path to Update directory. eg: - - UPDATEDIR="/srv/nominatim/update" - -4. Replication URL. eg: - BASEURL="https://download.geofabrik.de" - DOWNCOUNTRYPOSTFIX="-updates" + DOWNCOUNTRYPOSTFIX="-latest.osm.pbf" -5. Followup can be set according to your installation. eg: For Photon, +5. Followup in the update script can be set according to your installation. + E.g. for Photon, FOLLOWUP="curl http://localhost:2322/nominatim-update" will handle the indexing. + +To start the initial import, change into the project directory and run + +``` + bash import_multiple_regions.sh +``` + ### Updating the database -Run the following command from your Nominatim directory after configuring the file. +Change into the project directory and run the following command: - bash ./utils/update_database.sh + bash update_database.sh -This will get diffs from the replication server, import diffs and index the database. The default replication server in the script([Geofabrik](https://download.geofabrik.de)) provides daily updates. +This will get diffs from the replication server, import diffs and index +the database. The default replication server in the +script([Geofabrik](https://download.geofabrik.de)) provides daily updates. ## Importing Nominatim to an external PostgreSQL database diff --git a/lib-php/tokenizer/legacy_icu_tokenizer.php b/lib-php/tokenizer/legacy_icu_tokenizer.php index 3751e821..4e297954 100644 --- a/lib-php/tokenizer/legacy_icu_tokenizer.php +++ b/lib-php/tokenizer/legacy_icu_tokenizer.php @@ -19,7 +19,7 @@ class Tokenizer public function checkStatus() { - $sSQL = 'SELECT word_id FROM word limit 1'; + $sSQL = 'SELECT word_id FROM word WHERE word_id is not null limit 1'; $iWordID = $this->oDB->getOne($sSQL); if ($iWordID === false) { throw new \Exception('Query failed', 703); diff --git a/nominatim/clicmd/args.py b/nominatim/clicmd/args.py index 996f48f2..694e6fc5 100644 --- a/nominatim/clicmd/args.py +++ b/nominatim/clicmd/args.py @@ -1,7 +1,12 @@ """ Provides custom functions over command-line arguments. """ +import logging +from pathlib import Path +from nominatim.errors import UsageError + +LOG = logging.getLogger() class NominatimArgs: """ Customized namespace class for the nominatim command line tool @@ -25,3 +30,20 @@ class NominatimArgs: main_index=self.config.TABLESPACE_PLACE_INDEX ) ) + + + def get_osm_file_list(self): + """ Return the --osm-file argument as a list of Paths or None + if no argument was given. The function also checks if the files + exist and raises a UsageError if one cannot be found. + """ + if not self.osm_file: + return None + + files = [Path(f) for f in self.osm_file] + for fname in files: + if not fname.is_file(): + LOG.fatal("OSM file '%s' does not exist.", fname) + raise UsageError('Cannot access file.') + + return files diff --git a/nominatim/clicmd/setup.py b/nominatim/clicmd/setup.py index 878c8826..8bab6f38 100644 --- a/nominatim/clicmd/setup.py +++ b/nominatim/clicmd/setup.py @@ -9,7 +9,6 @@ import psutil from nominatim.db.connection import connect from nominatim.db import status, properties from nominatim.version import NOMINATIM_VERSION -from nominatim.errors import UsageError # Do not repeat documentation of subcommand classes. # pylint: disable=C0111 @@ -27,8 +26,9 @@ class SetupAll: def add_args(parser): group_name = parser.add_argument_group('Required arguments') group = group_name.add_mutually_exclusive_group(required=True) - group.add_argument('--osm-file', metavar='FILE', - help='OSM file to be imported.') + group.add_argument('--osm-file', metavar='FILE', action='append', + help='OSM file to be imported' + ' (repeat for importing multiple files.') group.add_argument('--continue', dest='continue_at', choices=['load-data', 'indexing', 'db-postprocess'], help='Continue an import that was interrupted') @@ -51,42 +51,25 @@ class SetupAll: @staticmethod - def run(args): # pylint: disable=too-many-statements + def run(args): from ..tools import database_import, refresh, postcodes, freeze from ..indexer.indexer import Indexer - from ..tokenizer import factory as tokenizer_factory - - if args.osm_file and not Path(args.osm_file).is_file(): - LOG.fatal("OSM file '%s' does not exist.", args.osm_file) - raise UsageError('Cannot access file.') if args.continue_at is None: + files = args.get_osm_file_list() + database_import.setup_database_skeleton(args.config.get_libpq_dsn(), args.data_dir, args.no_partitions, rouser=args.config.DATABASE_WEBUSER) LOG.warning('Importing OSM data file') - database_import.import_osm_data(Path(args.osm_file), + database_import.import_osm_data(files, args.osm2pgsql_options(0, 1), drop=args.no_updates, ignore_errors=args.ignore_errors) - with connect(args.config.get_libpq_dsn()) as conn: - LOG.warning('Create functions (1st pass)') - refresh.create_functions(conn, args.config, False, False) - LOG.warning('Create tables') - database_import.create_tables(conn, args.config, - reverse_only=args.reverse_only) - refresh.load_address_levels_from_file(conn, Path(args.config.ADDRESS_LEVEL_CONFIG)) - LOG.warning('Create functions (2nd pass)') - refresh.create_functions(conn, args.config, False, False) - LOG.warning('Create table triggers') - database_import.create_table_triggers(conn, args.config) - LOG.warning('Create partition tables') - database_import.create_partition_tables(conn, args.config) - LOG.warning('Create functions (3rd pass)') - refresh.create_functions(conn, args.config, False, False) + SetupAll._setup_tables(args.config, args.reverse_only) LOG.warning('Importing wikipedia importance data') data_path = Path(args.config.WIKIPEDIA_DATA_PATH or args.project_dir) @@ -105,12 +88,7 @@ class SetupAll: args.threads or psutil.cpu_count() or 1) LOG.warning("Setting up tokenizer") - if args.continue_at is None or args.continue_at == 'load-data': - # (re)initialise the tokenizer data - tokenizer = tokenizer_factory.create_tokenizer(args.config) - else: - # just load the tokenizer - tokenizer = tokenizer_factory.get_tokenizer_for_db(args.config) + tokenizer = SetupAll._get_tokenizer(args.continue_at, args.config) if args.continue_at is None or args.continue_at == 'load-data': LOG.warning('Calculate postcodes') @@ -145,19 +123,48 @@ class SetupAll: refresh.setup_website(webdir, args.config, conn) with connect(args.config.get_libpq_dsn()) as conn: - try: - dbdate = status.compute_database_date(conn) - status.set_status(conn, dbdate) - LOG.info('Database is at %s.', dbdate) - except Exception as exc: # pylint: disable=broad-except - LOG.error('Cannot determine date of database: %s', exc) - + SetupAll._set_database_date(conn) properties.set_property(conn, 'database_version', '{0[0]}.{0[1]}.{0[2]}-{0[3]}'.format(NOMINATIM_VERSION)) return 0 + @staticmethod + def _setup_tables(config, reverse_only): + """ Set up the basic database layout: tables, indexes and functions. + """ + from ..tools import database_import, refresh + + with connect(config.get_libpq_dsn()) as conn: + LOG.warning('Create functions (1st pass)') + refresh.create_functions(conn, config, False, False) + LOG.warning('Create tables') + database_import.create_tables(conn, config, reverse_only=reverse_only) + refresh.load_address_levels_from_file(conn, Path(config.ADDRESS_LEVEL_CONFIG)) + LOG.warning('Create functions (2nd pass)') + refresh.create_functions(conn, config, False, False) + LOG.warning('Create table triggers') + database_import.create_table_triggers(conn, config) + LOG.warning('Create partition tables') + database_import.create_partition_tables(conn, config) + LOG.warning('Create functions (3rd pass)') + refresh.create_functions(conn, config, False, False) + + + @staticmethod + def _get_tokenizer(continue_at, config): + """ Set up a new tokenizer or load an already initialised one. + """ + from ..tokenizer import factory as tokenizer_factory + + if continue_at is None or continue_at == 'load-data': + # (re)initialise the tokenizer data + return tokenizer_factory.create_tokenizer(config) + + # just load the tokenizer + return tokenizer_factory.get_tokenizer_for_db(config) + @staticmethod def _create_pending_index(conn, tablespace): """ Add a supporting index for finding places still to be indexed. @@ -178,3 +185,15 @@ class SetupAll: {} WHERE indexed_status > 0 """.format(tablespace)) conn.commit() + + + @staticmethod + def _set_database_date(conn): + """ Determine the database date and set the status accordingly. + """ + try: + dbdate = status.compute_database_date(conn) + status.set_status(conn, dbdate) + LOG.info('Database is at %s.', dbdate) + except Exception as exc: # pylint: disable=broad-except + LOG.error('Cannot determine date of database: %s', exc) diff --git a/nominatim/tools/database_import.py b/nominatim/tools/database_import.py index a4d7220f..0dd93490 100644 --- a/nominatim/tools/database_import.py +++ b/nominatim/tools/database_import.py @@ -103,11 +103,11 @@ def import_base_data(dsn, sql_dir, ignore_partitions=False): conn.commit() -def import_osm_data(osm_file, options, drop=False, ignore_errors=False): - """ Import the given OSM file. 'options' contains the list of +def import_osm_data(osm_files, options, drop=False, ignore_errors=False): + """ Import the given OSM files. 'options' contains the list of default settings for osm2pgsql. """ - options['import_file'] = osm_file + options['import_file'] = osm_files options['append'] = False options['threads'] = 1 @@ -115,7 +115,12 @@ def import_osm_data(osm_file, options, drop=False, ignore_errors=False): # Make some educated guesses about cache size based on the size # of the import file and the available memory. mem = psutil.virtual_memory() - fsize = os.stat(str(osm_file)).st_size + fsize = 0 + if isinstance(osm_files, list): + for fname in osm_files: + fsize += os.stat(str(fname)).st_size + else: + fsize = os.stat(str(osm_files)).st_size options['osm2pgsql_cache'] = int(min((mem.available + mem.cached) * 0.75, fsize * 2) / 1024 / 1024) + 1 diff --git a/nominatim/tools/exec_utils.py b/nominatim/tools/exec_utils.py index 6177b15f..cb39ad48 100644 --- a/nominatim/tools/exec_utils.py +++ b/nominatim/tools/exec_utils.py @@ -130,6 +130,9 @@ def run_osm2pgsql(options): if 'import_data' in options: cmd.extend(('-r', 'xml', '-')) + elif isinstance(options['import_file'], list): + for fname in options['import_file']: + cmd.append(str(fname)) else: cmd.append(str(options['import_file'])) diff --git a/osm2pgsql b/osm2pgsql index 7869a4e1..bd7b4440 160000 --- a/osm2pgsql +++ b/osm2pgsql @@ -1 +1 @@ -Subproject commit 7869a4e1255a7657bc8582a58adcf5aa2a3f3c70 +Subproject commit bd7b4440000a9c1df639c5fac020bc00bd590368 diff --git a/test/python/test_tools_database_import.py b/test/python/test_tools_database_import.py index 2291c166..aa90f8db 100644 --- a/test/python/test_tools_database_import.py +++ b/test/python/test_tools_database_import.py @@ -98,14 +98,25 @@ def test_import_base_data_ignore_partitions(dsn, src_dir, temp_db_with_extension def test_import_osm_data_simple(table_factory, osm2pgsql_options): table_factory('place', content=((1, ), )) - database_import.import_osm_data('file.pdf', osm2pgsql_options) + database_import.import_osm_data(Path('file.pbf'), osm2pgsql_options) + + +def test_import_osm_data_multifile(table_factory, tmp_path, osm2pgsql_options): + table_factory('place', content=((1, ), )) + osm2pgsql_options['osm2pgsql_cache'] = 0 + + files = [tmp_path / 'file1.osm', tmp_path / 'file2.osm'] + for f in files: + f.write_text('test') + + database_import.import_osm_data(files, osm2pgsql_options) def test_import_osm_data_simple_no_data(table_factory, osm2pgsql_options): table_factory('place') with pytest.raises(UsageError, match='No data.*'): - database_import.import_osm_data('file.pdf', osm2pgsql_options) + database_import.import_osm_data(Path('file.pbf'), osm2pgsql_options) def test_import_osm_data_drop(table_factory, temp_db_conn, tmp_path, osm2pgsql_options): @@ -117,7 +128,7 @@ def test_import_osm_data_drop(table_factory, temp_db_conn, tmp_path, osm2pgsql_o osm2pgsql_options['flatnode_file'] = str(flatfile.resolve()) - database_import.import_osm_data('file.pdf', osm2pgsql_options, drop=True) + database_import.import_osm_data(Path('file.pbf'), osm2pgsql_options, drop=True) assert not flatfile.exists() assert not temp_db_conn.table_exists('planet_osm_nodes') diff --git a/utils/import_multiple_regions.sh b/utils/import_multiple_regions.sh index 83323c2e..d15b2f55 100644 --- a/utils/import_multiple_regions.sh +++ b/utils/import_multiple_regions.sh @@ -8,8 +8,6 @@ # *) Set up sequence.state for updates -# *) Merge the pbf files into a single file. - # *) Setup nominatim db using 'setup.php --osm-file' # Hint: @@ -28,16 +26,6 @@ touch2() { mkdir -p "$(dirname "$1")" && touch "$1" ; } COUNTRIES="europe/monaco europe/andorra" -# SET TO YOUR NOMINATIM build FOLDER PATH: - -NOMINATIMBUILD="/srv/nominatim/build" -SETUPFILE="$NOMINATIMBUILD/utils/setup.php" -UPDATEFILE="$NOMINATIMBUILD/utils/update.php" - -# SET TO YOUR update FOLDER PATH: - -UPDATEDIR="/srv/nominatim/update" - # SET TO YOUR replication server URL: BASEURL="https://download.geofabrik.de" @@ -46,27 +34,24 @@ DOWNCOUNTRYPOSTFIX="-latest.osm.pbf" # End of configuration section # ****************************************************************************** -COMBINEFILES="osmium merge" +UPDATEDIR=update +IMPORT_CMD="nominatim import" mkdir -p ${UPDATEDIR} -cd ${UPDATEDIR} +pushd ${UPDATEDIR} rm -rf tmp mkdir -p tmp -cd tmp +popd for COUNTRY in $COUNTRIES; do - echo "====================================================================" echo "$COUNTRY" echo "====================================================================" DIR="$UPDATEDIR/$COUNTRY" - FILE="$DIR/configuration.txt" DOWNURL="$BASEURL/$COUNTRY$DOWNCOUNTRYPOSTFIX" IMPORTFILE=$COUNTRY$DOWNCOUNTRYPOSTFIX IMPORTFILEPATH=${UPDATEDIR}/tmp/${IMPORTFILE} - FILENAME=${COUNTRY//[\/]/_} - touch2 $IMPORTFILEPATH wget ${DOWNURL} -O $IMPORTFILEPATH @@ -74,18 +59,12 @@ do touch2 ${DIR}/sequence.state pyosmium-get-changes -O $IMPORTFILEPATH -f ${DIR}/sequence.state -v - COMBINEFILES="${COMBINEFILES} ${IMPORTFILEPATH}" + IMPORT_CMD="${IMPORT_CMD} --osm-file ${IMPORTFILEPATH}" echo $IMPORTFILE echo "====================================================================" done - -echo "${COMBINEFILES} -o combined.osm.pbf" -${COMBINEFILES} -o combined.osm.pbf - echo "====================================================================" echo "Setting up nominatim db" -${SETUPFILE} --osm-file ${UPDATEDIR}/tmp/combined.osm.pbf --all 2>&1 - -# ${UPDATEFILE} --import-file ${UPDATEDIR}/tmp/combined.osm.pbf 2>&1 -echo "====================================================================" \ No newline at end of file +${IMPORT_CMD} 2>&1 +echo "====================================================================" diff --git a/utils/update_database.sh b/utils/update_database.sh index 75d0de5d..58f7690a 100644 --- a/utils/update_database.sh +++ b/utils/update_database.sh @@ -22,25 +22,14 @@ # REPLACE WITH LIST OF YOUR "COUNTRIES": # - - COUNTRIES="europe/monaco europe/andorra" -# SET TO YOUR NOMINATIM build FOLDER PATH: -# -NOMINATIMBUILD="/srv/nominatim/build" -UPDATEFILE="$NOMINATIMBUILD/utils/update.php" - -# SET TO YOUR update data FOLDER PATH: -# -UPDATEDIR="/srv/nominatim/update" - UPDATEBASEURL="https://download.geofabrik.de" UPDATECOUNTRYPOSTFIX="-updates" # If you do not use Photon, let Nominatim handle (re-)indexing: # -FOLLOWUP="$UPDATEFILE --index" +FOLLOWUP="nominatim index" # # If you use Photon, update Photon and let it handle the index # (Photon server must be running and must have been started with "-database", @@ -49,11 +38,10 @@ FOLLOWUP="$UPDATEFILE --index" #FOLLOWUP="curl http://localhost:2322/nominatim-update" # ****************************************************************************** - +UPDATEDIR="update" for COUNTRY in $COUNTRIES; do - echo "====================================================================" echo "$COUNTRY" echo "====================================================================" @@ -61,20 +49,16 @@ do FILE="$DIR/sequence.state" BASEURL="$UPDATEBASEURL/$COUNTRY$UPDATECOUNTRYPOSTFIX" FILENAME=${COUNTRY//[\/]/_} - - # mkdir -p ${DIR} - cd ${DIR} echo "Attempting to get changes" + rm -f ${DIR}/${FILENAME}.osc.gz pyosmium-get-changes -o ${DIR}/${FILENAME}.osc.gz -f ${FILE} --server $BASEURL -v echo "Attempting to import diffs" - ${NOMINATIMBUILD}/utils/update.php --import-diff ${DIR}/${FILENAME}.osc.gz - rm ${DIR}/${FILENAME}.osc.gz - + nominatim add-data --diff ${DIR}/${FILENAME}.osc.gz done echo "====================================================================" echo "Reindexing" ${FOLLOWUP} -echo "====================================================================" \ No newline at end of file +echo "===================================================================="