installed the Nominatim software itself, if not return to the
[installation page](Installation.md).
-## Importing multiple regions
+## Importing multiple regions (without updates)
-To import multiple regions in your database, you need to configure and run `utils/import_multiple_regions.sh` file. This script will set up the update directory which has the following structure:
+To import multiple regions in your database you can simply give multiple
+OSM files to the import command:
+
+```
+nominatim import --osm-file file1.pbf --osm-file file2.pbf
+```
+
+If you already have imported a file and want to add another one, you can
+use the add-data function to import the additional data as follows:
+
+```
+nominatim add-data --file <FILE>
+nominatim refresh --postcodes
+nominatim index -j <NUMBER OF THREADS>
+```
+
+Please note that adding additional data is always significantly slower than
+the original import.
+
+## Importing multiple regions (with updates)
+
+If you want to import multiple regions _and_ be able to keep them up-to-date
+with updates, then you can use the scripts provided in the `utils` directory.
+
+These scripts will set up an `update` directory in your project directory,
+which has the following structure:
```bash
update
│ └── monaco
│ └── sequence.state
└── tmp
- ├── combined.osm.pbf
└── europe
├── andorra-latest.osm.pbf
└── monaco-latest.osm.pbf
```
-The `sequence.state` files will contain the sequence ID, which will be used by pyosmium to get updates. The tmp folder is used for import dump.
-
-### Configuring multiple regions
-
-The file `import_multiple_regions.sh` needs to be edited as per your requirement:
-
-1. List of countries. eg:
-
- COUNTRIES="europe/monaco europe/andorra"
-
-2. Path to Build directory. eg:
+The `sequence.state` files contain the sequence ID for each region. They will
+be used by pyosmium to get updates. The `tmp` folder is used for import dump and
+can be deleted once the import is complete.
- NOMINATIMBUILD="/srv/nominatim/build"
-
-3. Path to Update directory. eg:
-
- UPDATEDIR="/srv/nominatim/update"
-
-4. Replication URL. eg:
-
- BASEURL="https://download.geofabrik.de"
- DOWNCOUNTRYPOSTFIX="-latest.osm.pbf"
### Setting up multiple regions
-!!! tip
- If your database already exists and you want to add more countries,
- replace the setting up part
- `${SETUPFILE} --osm-file ${UPDATEDIR}/tmp/combined.osm.pbf --all 2>&1`
- with `${UPDATEFILE} --import-file ${UPDATEDIR}/tmp/combined.osm.pbf --index --index-instances N 2>&1`
- where N is the numbers of CPUs in your system.
-
-Run the following command from your Nominatim directory after configuring the file.
-
- bash ./utils/import_multiple_regions.sh
-
-!!! danger "Important"
- This file uses osmium-tool. It must be installed before executing the import script.
- Installation instructions can be found [here](https://osmcode.org/osmium-tool/manual.html#installation).
-
-### Updating multiple regions
-
-To import multiple regions in your database, you need to configure and run ```utils/update_database.sh```.
-This uses the update directory set up while setting up the DB.
+Create a project directory as described for the
+[simple import](Import.md#creating-the-project-directory). If necessary,
+you can also add an `.env` configuration with customized options. In particular,
+you need to make sure that `NOMINATIM_REPLICATION_UPDATE_INTERVAL` and
+`NOMINATIM_REPLICATION_RECHECK_INTERVAL` are set according to the update
+interval of the extract server you use.
-### Configuring multiple regions
+Copy the scripts `utils/import_multiple_regions.sh` and `utils/update_database.sh`
+into the project directory.
-The file `update_database.sh` needs to be edited as per your requirement:
+Now customize both files as per your requirements
-1. List of countries. eg:
+1. List of countries. e.g.
COUNTRIES="europe/monaco europe/andorra"
-2. Path to Build directory. eg:
+2. URL to the service providing the extracts and updates. eg:
- NOMINATIMBUILD="/srv/nominatim/build"
-
-3. Path to Update directory. eg:
-
- UPDATEDIR="/srv/nominatim/update"
-
-4. Replication URL. eg:
-
BASEURL="https://download.geofabrik.de"
- DOWNCOUNTRYPOSTFIX="-updates"
+ DOWNCOUNTRYPOSTFIX="-latest.osm.pbf"
-5. Followup can be set according to your installation. eg: For Photon,
+5. Followup in the update script can be set according to your installation.
+ E.g. for Photon,
FOLLOWUP="curl http://localhost:2322/nominatim-update"
will handle the indexing.
+
+To start the initial import, change into the project directory and run
+
+```
+ bash import_multiple_regions.sh
+```
+
### Updating the database
-Run the following command from your Nominatim directory after configuring the file.
+Change into the project directory and run the following command:
- bash ./utils/update_database.sh
+ bash update_database.sh
-This will get diffs from the replication server, import diffs and index the database. The default replication server in the script([Geofabrik](https://download.geofabrik.de)) provides daily updates.
+This will get diffs from the replication server, import diffs and index
+the database. The default replication server in the
+script([Geofabrik](https://download.geofabrik.de)) provides daily updates.
## Importing Nominatim to an external PostgreSQL database
public function checkStatus()
{
- $sSQL = 'SELECT word_id FROM word limit 1';
+ $sSQL = 'SELECT word_id FROM word WHERE word_id is not null limit 1';
$iWordID = $this->oDB->getOne($sSQL);
if ($iWordID === false) {
throw new \Exception('Query failed', 703);
"""
Provides custom functions over command-line arguments.
"""
+import logging
+from pathlib import Path
+from nominatim.errors import UsageError
+
+LOG = logging.getLogger()
class NominatimArgs:
""" Customized namespace class for the nominatim command line tool
main_index=self.config.TABLESPACE_PLACE_INDEX
)
)
+
+
+ def get_osm_file_list(self):
+ """ Return the --osm-file argument as a list of Paths or None
+ if no argument was given. The function also checks if the files
+ exist and raises a UsageError if one cannot be found.
+ """
+ if not self.osm_file:
+ return None
+
+ files = [Path(f) for f in self.osm_file]
+ for fname in files:
+ if not fname.is_file():
+ LOG.fatal("OSM file '%s' does not exist.", fname)
+ raise UsageError('Cannot access file.')
+
+ return files
from nominatim.db.connection import connect
from nominatim.db import status, properties
from nominatim.version import NOMINATIM_VERSION
-from nominatim.errors import UsageError
# Do not repeat documentation of subcommand classes.
# pylint: disable=C0111
def add_args(parser):
group_name = parser.add_argument_group('Required arguments')
group = group_name.add_mutually_exclusive_group(required=True)
- group.add_argument('--osm-file', metavar='FILE',
- help='OSM file to be imported.')
+ group.add_argument('--osm-file', metavar='FILE', action='append',
+ help='OSM file to be imported'
+ ' (repeat for importing multiple files.')
group.add_argument('--continue', dest='continue_at',
choices=['load-data', 'indexing', 'db-postprocess'],
help='Continue an import that was interrupted')
@staticmethod
- def run(args): # pylint: disable=too-many-statements
+ def run(args):
from ..tools import database_import, refresh, postcodes, freeze
from ..indexer.indexer import Indexer
- from ..tokenizer import factory as tokenizer_factory
-
- if args.osm_file and not Path(args.osm_file).is_file():
- LOG.fatal("OSM file '%s' does not exist.", args.osm_file)
- raise UsageError('Cannot access file.')
if args.continue_at is None:
+ files = args.get_osm_file_list()
+
database_import.setup_database_skeleton(args.config.get_libpq_dsn(),
args.data_dir,
args.no_partitions,
rouser=args.config.DATABASE_WEBUSER)
LOG.warning('Importing OSM data file')
- database_import.import_osm_data(Path(args.osm_file),
+ database_import.import_osm_data(files,
args.osm2pgsql_options(0, 1),
drop=args.no_updates,
ignore_errors=args.ignore_errors)
- with connect(args.config.get_libpq_dsn()) as conn:
- LOG.warning('Create functions (1st pass)')
- refresh.create_functions(conn, args.config, False, False)
- LOG.warning('Create tables')
- database_import.create_tables(conn, args.config,
- reverse_only=args.reverse_only)
- refresh.load_address_levels_from_file(conn, Path(args.config.ADDRESS_LEVEL_CONFIG))
- LOG.warning('Create functions (2nd pass)')
- refresh.create_functions(conn, args.config, False, False)
- LOG.warning('Create table triggers')
- database_import.create_table_triggers(conn, args.config)
- LOG.warning('Create partition tables')
- database_import.create_partition_tables(conn, args.config)
- LOG.warning('Create functions (3rd pass)')
- refresh.create_functions(conn, args.config, False, False)
+ SetupAll._setup_tables(args.config, args.reverse_only)
LOG.warning('Importing wikipedia importance data')
data_path = Path(args.config.WIKIPEDIA_DATA_PATH or args.project_dir)
args.threads or psutil.cpu_count() or 1)
LOG.warning("Setting up tokenizer")
- if args.continue_at is None or args.continue_at == 'load-data':
- # (re)initialise the tokenizer data
- tokenizer = tokenizer_factory.create_tokenizer(args.config)
- else:
- # just load the tokenizer
- tokenizer = tokenizer_factory.get_tokenizer_for_db(args.config)
+ tokenizer = SetupAll._get_tokenizer(args.continue_at, args.config)
if args.continue_at is None or args.continue_at == 'load-data':
LOG.warning('Calculate postcodes')
refresh.setup_website(webdir, args.config, conn)
with connect(args.config.get_libpq_dsn()) as conn:
- try:
- dbdate = status.compute_database_date(conn)
- status.set_status(conn, dbdate)
- LOG.info('Database is at %s.', dbdate)
- except Exception as exc: # pylint: disable=broad-except
- LOG.error('Cannot determine date of database: %s', exc)
-
+ SetupAll._set_database_date(conn)
properties.set_property(conn, 'database_version',
'{0[0]}.{0[1]}.{0[2]}-{0[3]}'.format(NOMINATIM_VERSION))
return 0
+ @staticmethod
+ def _setup_tables(config, reverse_only):
+ """ Set up the basic database layout: tables, indexes and functions.
+ """
+ from ..tools import database_import, refresh
+
+ with connect(config.get_libpq_dsn()) as conn:
+ LOG.warning('Create functions (1st pass)')
+ refresh.create_functions(conn, config, False, False)
+ LOG.warning('Create tables')
+ database_import.create_tables(conn, config, reverse_only=reverse_only)
+ refresh.load_address_levels_from_file(conn, Path(config.ADDRESS_LEVEL_CONFIG))
+ LOG.warning('Create functions (2nd pass)')
+ refresh.create_functions(conn, config, False, False)
+ LOG.warning('Create table triggers')
+ database_import.create_table_triggers(conn, config)
+ LOG.warning('Create partition tables')
+ database_import.create_partition_tables(conn, config)
+ LOG.warning('Create functions (3rd pass)')
+ refresh.create_functions(conn, config, False, False)
+
+
+ @staticmethod
+ def _get_tokenizer(continue_at, config):
+ """ Set up a new tokenizer or load an already initialised one.
+ """
+ from ..tokenizer import factory as tokenizer_factory
+
+ if continue_at is None or continue_at == 'load-data':
+ # (re)initialise the tokenizer data
+ return tokenizer_factory.create_tokenizer(config)
+
+ # just load the tokenizer
+ return tokenizer_factory.get_tokenizer_for_db(config)
+
@staticmethod
def _create_pending_index(conn, tablespace):
""" Add a supporting index for finding places still to be indexed.
{} WHERE indexed_status > 0
""".format(tablespace))
conn.commit()
+
+
+ @staticmethod
+ def _set_database_date(conn):
+ """ Determine the database date and set the status accordingly.
+ """
+ try:
+ dbdate = status.compute_database_date(conn)
+ status.set_status(conn, dbdate)
+ LOG.info('Database is at %s.', dbdate)
+ except Exception as exc: # pylint: disable=broad-except
+ LOG.error('Cannot determine date of database: %s', exc)
conn.commit()
-def import_osm_data(osm_file, options, drop=False, ignore_errors=False):
- """ Import the given OSM file. 'options' contains the list of
+def import_osm_data(osm_files, options, drop=False, ignore_errors=False):
+ """ Import the given OSM files. 'options' contains the list of
default settings for osm2pgsql.
"""
- options['import_file'] = osm_file
+ options['import_file'] = osm_files
options['append'] = False
options['threads'] = 1
# Make some educated guesses about cache size based on the size
# of the import file and the available memory.
mem = psutil.virtual_memory()
- fsize = os.stat(str(osm_file)).st_size
+ fsize = 0
+ if isinstance(osm_files, list):
+ for fname in osm_files:
+ fsize += os.stat(str(fname)).st_size
+ else:
+ fsize = os.stat(str(osm_files)).st_size
options['osm2pgsql_cache'] = int(min((mem.available + mem.cached) * 0.75,
fsize * 2) / 1024 / 1024) + 1
if 'import_data' in options:
cmd.extend(('-r', 'xml', '-'))
+ elif isinstance(options['import_file'], list):
+ for fname in options['import_file']:
+ cmd.append(str(fname))
else:
cmd.append(str(options['import_file']))
-Subproject commit 7869a4e1255a7657bc8582a58adcf5aa2a3f3c70
+Subproject commit bd7b4440000a9c1df639c5fac020bc00bd590368
def test_import_osm_data_simple(table_factory, osm2pgsql_options):
table_factory('place', content=((1, ), ))
- database_import.import_osm_data('file.pdf', osm2pgsql_options)
+ database_import.import_osm_data(Path('file.pbf'), osm2pgsql_options)
+
+
+def test_import_osm_data_multifile(table_factory, tmp_path, osm2pgsql_options):
+ table_factory('place', content=((1, ), ))
+ osm2pgsql_options['osm2pgsql_cache'] = 0
+
+ files = [tmp_path / 'file1.osm', tmp_path / 'file2.osm']
+ for f in files:
+ f.write_text('test')
+
+ database_import.import_osm_data(files, osm2pgsql_options)
def test_import_osm_data_simple_no_data(table_factory, osm2pgsql_options):
table_factory('place')
with pytest.raises(UsageError, match='No data.*'):
- database_import.import_osm_data('file.pdf', osm2pgsql_options)
+ database_import.import_osm_data(Path('file.pbf'), osm2pgsql_options)
def test_import_osm_data_drop(table_factory, temp_db_conn, tmp_path, osm2pgsql_options):
osm2pgsql_options['flatnode_file'] = str(flatfile.resolve())
- database_import.import_osm_data('file.pdf', osm2pgsql_options, drop=True)
+ database_import.import_osm_data(Path('file.pbf'), osm2pgsql_options, drop=True)
assert not flatfile.exists()
assert not temp_db_conn.table_exists('planet_osm_nodes')
# *) Set up sequence.state for updates
-# *) Merge the pbf files into a single file.
-
# *) Setup nominatim db using 'setup.php --osm-file'
# Hint:
COUNTRIES="europe/monaco europe/andorra"
-# SET TO YOUR NOMINATIM build FOLDER PATH:
-
-NOMINATIMBUILD="/srv/nominatim/build"
-SETUPFILE="$NOMINATIMBUILD/utils/setup.php"
-UPDATEFILE="$NOMINATIMBUILD/utils/update.php"
-
-# SET TO YOUR update FOLDER PATH:
-
-UPDATEDIR="/srv/nominatim/update"
-
# SET TO YOUR replication server URL:
BASEURL="https://download.geofabrik.de"
# End of configuration section
# ******************************************************************************
-COMBINEFILES="osmium merge"
+UPDATEDIR=update
+IMPORT_CMD="nominatim import"
mkdir -p ${UPDATEDIR}
-cd ${UPDATEDIR}
+pushd ${UPDATEDIR}
rm -rf tmp
mkdir -p tmp
-cd tmp
+popd
for COUNTRY in $COUNTRIES;
do
-
echo "===================================================================="
echo "$COUNTRY"
echo "===================================================================="
DIR="$UPDATEDIR/$COUNTRY"
- FILE="$DIR/configuration.txt"
DOWNURL="$BASEURL/$COUNTRY$DOWNCOUNTRYPOSTFIX"
IMPORTFILE=$COUNTRY$DOWNCOUNTRYPOSTFIX
IMPORTFILEPATH=${UPDATEDIR}/tmp/${IMPORTFILE}
- FILENAME=${COUNTRY//[\/]/_}
-
touch2 $IMPORTFILEPATH
wget ${DOWNURL} -O $IMPORTFILEPATH
touch2 ${DIR}/sequence.state
pyosmium-get-changes -O $IMPORTFILEPATH -f ${DIR}/sequence.state -v
- COMBINEFILES="${COMBINEFILES} ${IMPORTFILEPATH}"
+ IMPORT_CMD="${IMPORT_CMD} --osm-file ${IMPORTFILEPATH}"
echo $IMPORTFILE
echo "===================================================================="
done
-
-echo "${COMBINEFILES} -o combined.osm.pbf"
-${COMBINEFILES} -o combined.osm.pbf
-
echo "===================================================================="
echo "Setting up nominatim db"
-${SETUPFILE} --osm-file ${UPDATEDIR}/tmp/combined.osm.pbf --all 2>&1
-
-# ${UPDATEFILE} --import-file ${UPDATEDIR}/tmp/combined.osm.pbf 2>&1
-echo "===================================================================="
\ No newline at end of file
+${IMPORT_CMD} 2>&1
+echo "===================================================================="
# REPLACE WITH LIST OF YOUR "COUNTRIES":
#
-
-
COUNTRIES="europe/monaco europe/andorra"
-# SET TO YOUR NOMINATIM build FOLDER PATH:
-#
-NOMINATIMBUILD="/srv/nominatim/build"
-UPDATEFILE="$NOMINATIMBUILD/utils/update.php"
-
-# SET TO YOUR update data FOLDER PATH:
-#
-UPDATEDIR="/srv/nominatim/update"
-
UPDATEBASEURL="https://download.geofabrik.de"
UPDATECOUNTRYPOSTFIX="-updates"
# If you do not use Photon, let Nominatim handle (re-)indexing:
#
-FOLLOWUP="$UPDATEFILE --index"
+FOLLOWUP="nominatim index"
#
# If you use Photon, update Photon and let it handle the index
# (Photon server must be running and must have been started with "-database",
#FOLLOWUP="curl http://localhost:2322/nominatim-update"
# ******************************************************************************
-
+UPDATEDIR="update"
for COUNTRY in $COUNTRIES;
do
-
echo "===================================================================="
echo "$COUNTRY"
echo "===================================================================="
FILE="$DIR/sequence.state"
BASEURL="$UPDATEBASEURL/$COUNTRY$UPDATECOUNTRYPOSTFIX"
FILENAME=${COUNTRY//[\/]/_}
-
- # mkdir -p ${DIR}
- cd ${DIR}
echo "Attempting to get changes"
+ rm -f ${DIR}/${FILENAME}.osc.gz
pyosmium-get-changes -o ${DIR}/${FILENAME}.osc.gz -f ${FILE} --server $BASEURL -v
echo "Attempting to import diffs"
- ${NOMINATIMBUILD}/utils/update.php --import-diff ${DIR}/${FILENAME}.osc.gz
- rm ${DIR}/${FILENAME}.osc.gz
-
+ nominatim add-data --diff ${DIR}/${FILENAME}.osc.gz
done
echo "===================================================================="
echo "Reindexing"
${FOLLOWUP}
-echo "===================================================================="
\ No newline at end of file
+echo "===================================================================="