From: Sarah Hoffmann Date: Mon, 25 Oct 2021 15:21:10 +0000 (+0200) Subject: Merge remote-tracking branch 'upstream/master' X-Git-Tag: deploy~147 X-Git-Url: https://git.openstreetmap.org./nominatim.git/commitdiff_plain/452021ef0c9ac746949a2ef7fd12db4c2d5fee35?hp=554d25e2e508c57fe763cee95b574b8e7c66eb0b Merge remote-tracking branch 'upstream/master' --- diff --git a/docs/admin/Migration.md b/docs/admin/Migration.md index a8c1375d..8458e3d9 100644 --- a/docs/admin/Migration.md +++ b/docs/admin/Migration.md @@ -15,6 +15,20 @@ breaking changes. **Please read them before running the migration.** If you are migrating from a version <3.6, then you still have to follow the manual migration steps up to 3.6. +## 3.7.0 -> master + +### NOMINATIM_PHRASE_CONFIG removed + +Custom blacklist configurations for special phrases now need to be handed +with the `--config` parameter to `nominatim special-phrases`. Alternatively +you can put your custom configuration in the project directory in a file +named `phrase-settings.json`. + +Version 3.8 also removes the automatic converter for the php format of +the configuration in older versions. If you are updating from Nominatim < 3.7 +and still work with a custom `phrase-settings.php`, you need to manually +convert it into a json format. + ## 3.6.0 -> 3.7.0 ### New format and name of configuration file diff --git a/docs/admin/Update.md b/docs/admin/Update.md index a2323cfe..9d224b9e 100644 --- a/docs/admin/Update.md +++ b/docs/admin/Update.md @@ -10,18 +10,21 @@ For a list of other methods to add or update data see the output of If you have configured a flatnode file for the import, then you need to keep this flatnode file around for updates. -#### Installing the newest version of Pyosmium +### Installing the newest version of Pyosmium -It is recommended to install Pyosmium via pip. Make sure to use python3. +The replication process uses +[Pyosmium](https://docs.osmcode.org/pyosmium/latest/updating_osm_data.html) +to download update data from the server. +It is recommended to install Pyosmium via pip. Run (as the same user who will later run the updates): ```sh pip3 install --user osmium ``` -#### Setting up the update process +### Setting up the update process -Next the update needs to be initialised. By default Nominatim is configured +Next the update process needs to be initialised. By default Nominatim is configured to update using the global minutely diffs. If you want a different update source you will need to add some settings @@ -45,12 +48,119 @@ what you expect. The `replication --init` command needs to be rerun whenever the replication service is changed. -#### Updating Nominatim +### Updating Nominatim -The following command will keep your database constantly up to date: +Nominatim supports different modes how to retrieve the update data from the +server. Which one you want to use depends on your exact setup and how often you +want to retrieve updates. + +These instructions are for using a single source of updates. If you have +imported multiple country extracts and want to keep them +up-to-date, [Advanced installations section](Advanced-Installations.md) +contains instructions to set up and update multiple country extracts. + +#### Continuous updates + +This is the easiest mode. Simply run the replication command without any +parameters: nominatim replication -If you have imported multiple country extracts and want to keep them -up-to-date, [Advanced installations section](Advanced-Installations.md) contains instructions -to set up and update multiple country extracts. +The update application keeps running forever and retrieves and applies +new updates from the server as they are published. + +You can run this command as a simple systemd service. Create a service +description like that in `/etc/systemd/system/nominatim-update.service`: + +``` +[Unit] +Description=Continuous updates of Nominatim + +[Service] +WorkingDirectory=/srv/nominatim +ExecStart=nominatim replication +StandardOutput=append:/var/log/nominatim-updates.log +StandardError=append:/var/log/nominatim-updates.error.log +User=nominatim +Group=nominatim +Type=simple + +[Install] +WantedBy=multi-user.target +``` + +Replace the `WorkingDirectory` with your project directory. Also adapt user +and group names as required. + +Now activate the service and start the updates: + +``` +sudo systemctl daemon-reload +sudo systemctl enable nominatim-updates +sudo systemctl start nominatim-updates +``` + +#### One-time mode + +When the `--once` parameter is given, then Nominatim will download exactly one +batch of updates and then exit. This one-time mode still respects the +`NOMINATIM_REPLICATION_UPDATE_INTERVAL` that you have set. If according to +the update interval no new data has been published yet, it will go to sleep +until the next expected update and only then attempt to download the next batch. + +The one-time mode is particularly useful if you want to run updates continuously +but need to schedule other work in between updates. For example, the main +service at osm.org uses it, to regularly recompute postcodes -- a process that +must not be run while updates are in progress. Its update script +looks like this: + +```sh +#!/bin/bash + +# Switch to your project directory. +cd /srv/nominatim + +while true; do + nominatim replication --once + if [ -f "/srv/nominatim/schedule-mainenance" ]; then + rm /srv/nominatim/schedule-mainenance + nominatim refresh --postcodes + fi +done +``` + +A cron job then creates the file `/srv/nominatim/need-mainenance` once per night. + + +#### Catch-up mode + +With the `--catch-up` parameter, Nominatim will immediately try to download +all changes from the server until the database is up-to-date. The catch-up mode +still respects the parameter `NOMINATIM_REPLICATION_MAX_DIFF`. It downloads and +applies the changes in appropriate batches until all is done. + +The catch-up mode is foremost useful to bring the database up to speed after the +initial import. Give that the service usually is not in production at this +point, you can temporarily be a bit more generous with the batch size and +number of threads you use for the updates by running catch-up like this: + +``` +cd /srv/nominatim +NOMINATIM_REPLICATION_MAX_DIFF=5000 nominatim replication --catch-up --threads 15 +``` + +The catch-up mode is also useful when you want to apply updates at a lower +frequency than what the source publishes. You can set up a cron job to run +replication catch-up at whatever interval you desire. + +!!! hint + When running scheduled updates with catch-up, it is a good idea to choose + a replication source with an update frequency that is an order of magnitude + lower. For example, if you want to update once a day, use an hourly updated + source. This makes sure that you don't miss an entire day of updates when + the source is unexpectely late to publish its update. + + If you want to use the source with the same update frequency (e.g. a daily + updated source with daily updates), use the + continuous update mode. It ensures to re-request the newest update until it + is published. diff --git a/docs/customize/Settings.md b/docs/customize/Settings.md index 7eebebe8..428ef11c 100644 --- a/docs/customize/Settings.md +++ b/docs/customize/Settings.md @@ -112,6 +112,9 @@ Points to the file with additional configuration for the tokenizer. See the [Tokenizer](../customize/Tokenizers.md) descriptions for details on the file format. +If a relative path is given, then the file is searched first relative to the +project directory and then in the global settings directory. + #### NOMINATIM_MAX_WORD_FREQUENCY | Summary | | @@ -150,7 +153,7 @@ objects when the area becomes too large. | Summary | | | -------------- | --------------------------------------------------- | | **Description:** | Restrict search languages | -| **Format:** | comma,separated list of language codes | +| **Format:** | string: comma-separated list of language codes | | **Default:** | _empty_ | Normally Nominatim will include all language variants of name:XX @@ -283,7 +286,7 @@ setting to define the password for proxies that require a login. Nominatim uses [osm2pgsql](https://osm2pgsql.org) to load the OSM data initially into the database. Nominatim comes bundled with a version of osm2pgsql that is guaranteed to be compatible. Use this setting to use -a different binary instead. You should do this only, when you know exactly +a different binary instead. You should do this only when you know exactly what you are doing. If the osm2pgsql version is not compatible, then the result is undefined. @@ -300,31 +303,21 @@ Set a custom location for the [wikipedia ranking file](../admin/Import.md#wikipediawikidata-rankings). When unset, Nominatim expects the data to be saved in the project directory. -#### NOMINATIM_PHRASE_CONFIG - -| Summary | | -| -------------- | --------------------------------------------------- | -| **Description:** | Configuration file for special phrase imports | -| **Format:** | path | -| **Default:** | _empty_ (use default settings) | - -The _phrase_config_ file configures black and white lists of tag types, -so that some of them can be ignored, when loading special phrases from -the OSM wiki. The default settings can be found in the configuration -directory as `phrase-settings.json`. - #### NOMINATIM_ADDRESS_LEVEL_CONFIG | Summary | | | -------------- | --------------------------------------------------- | | **Description:** | Configuration file for rank assignments | | **Format:** | path | -| **Default:** | _empty_ (use default settings) | +| **Default:** | address-levels.json | -The _address level config_ configures rank assignments for places. See +The _address level configuration_ defines the rank assignments for places. See [Place Ranking](Ranking.md) for a detailed explanation what rank assignments -are and what the configuration file must look like. The default configuration -can be found in the configuration directory as `address-levels.json`. +are and what the configuration file must look like. + +When a relative path is given, then the file is searched first relative to the +project directory and then in the global settings directory. + #### NOMINATIM_IMPORT_STYLE @@ -335,9 +328,13 @@ can be found in the configuration directory as `address-levels.json`. | **Default:** | extratags | The _style configuration_ describes which OSM objects and tags are taken -into consideration for the search database. This setting may either -be a string pointing to one of the internal styles or it may be a path -pointing to a custom style. +into consideration for the search database. Nominatim comes with a set +of pre-configured styles, that may be configured here. + +You can also write your own custom style and point the setting to the file +with the style. When a relative path is given, then the style file is searched +first relative to the project directory and then in the global settings +directory. See [Import Styles](Import-Styles.md) for more information on the available internal styles and the format of the @@ -357,6 +354,9 @@ location for OSM nodes. For larger imports it can significantly speed up the import. When this option is unset, then osm2pgsql uses a PsotgreSQL table to store the locations. +When a relative path is given, then the flatnode file is created/searched +relative to the project directory. + !!! warning The flatnode file is not only used during the initial import but also @@ -634,7 +634,11 @@ Can be used as the same time as NOMINATIM_LOG_FILE. | **After Changes:** | run `nominatim refresh --website` | Enable logging of requests into a file with this setting by setting the log -file where to log to. The entries in the log file have the following format: +file where to log to. A relative file name is assumed to be relative to +the project directory. + + +The entries in the log file have the following format: "" diff --git a/lib-php/ReverseGeocode.php b/lib-php/ReverseGeocode.php index 47e931ef..a670c623 100644 --- a/lib-php/ReverseGeocode.php +++ b/lib-php/ReverseGeocode.php @@ -111,6 +111,7 @@ class ReverseGeocode $sSQL .= ' FROM placex'; $sSQL .= ' WHERE osm_type = \'N\''; $sSQL .= ' AND country_code = \''.$sCountryCode.'\''; + $sSQL .= ' AND rank_search < 26 '; // needed to select right index $sSQL .= ' AND rank_search between 5 and ' .min(25, $iMaxRank); $sSQL .= ' AND class = \'place\' AND type != \'postcode\''; $sSQL .= ' AND name IS NOT NULL '; @@ -206,6 +207,7 @@ class ReverseGeocode // for place nodes at rank_address 16 $sSQL .= ' AND rank_search > '.$iRankSearch; $sSQL .= ' AND rank_search <= '.$iMaxRank; + $sSQL .= ' AND rank_search < 26 '; // needed to select right index $sSQL .= ' AND rank_address > 0'; $sSQL .= ' AND class = \'place\''; $sSQL .= ' AND type != \'postcode\''; diff --git a/lib-php/admin/warm.php b/lib-php/admin/warm.php index 39a37506..338ec2da 100644 --- a/lib-php/admin/warm.php +++ b/lib-php/admin/warm.php @@ -86,8 +86,13 @@ if (!$aResult['reverse-only']) { if ($bVerbose) { echo "\n"; } + + $oTokenizer = new \Nominatim\Tokenizer($oDB); + + $aWords = $oTokenizer->mostFrequentWords(1000); + $sSQL = 'SELECT word FROM word WHERE word is not null ORDER BY search_name_count DESC LIMIT 1000'; - foreach ($oDB->getCol($sSQL) as $sWord) { + foreach ($aWords as $sWord) { if ($bVerbose) { echo "$sWord = "; } diff --git a/lib-php/migration/PhraseSettingsToJson.php b/lib-php/migration/PhraseSettingsToJson.php deleted file mode 100644 index ac6e6213..00000000 --- a/lib-php/migration/PhraseSettingsToJson.php +++ /dev/null @@ -1,21 +0,0 @@ -oNormalizer->transliterate($sTerm); } + + public function mostFrequentWords($iNum) + { + $sSQL = "SELECT word FROM word WHERE type = 'W'"; + $sSQL .= "ORDER BY info->'count' DESC LIMIT ".$iNum; + return $this->oDB->getCol($sSQL); + } + + private function makeStandardWord($sTerm) { return trim($this->oTransliterator->transliterate(' '.$sTerm.' ')); diff --git a/lib-php/tokenizer/legacy_tokenizer.php b/lib-php/tokenizer/legacy_tokenizer.php index b508d220..d5686f64 100644 --- a/lib-php/tokenizer/legacy_tokenizer.php +++ b/lib-php/tokenizer/legacy_tokenizer.php @@ -48,6 +48,14 @@ class Tokenizer } + public function mostFrequentWords($iNum) + { + $sSQL = 'SELECT word FROM word WHERE word is not null '; + $sSQL .= 'ORDER BY search_name_count DESC LIMIT '.$iNum; + return $this->oDB->getCol($sSQL); + } + + public function tokensForSpecialTerm($sTerm) { $aResults = array(); diff --git a/nominatim/clicmd/args.py b/nominatim/clicmd/args.py index 694e6fc5..4e2c23a7 100644 --- a/nominatim/clicmd/args.py +++ b/nominatim/clicmd/args.py @@ -23,7 +23,7 @@ class NominatimArgs: osm2pgsql_style=self.config.get_import_style_file(), threads=self.threads or default_threads, dsn=self.config.get_libpq_dsn(), - flatnode_file=self.config.FLATNODE_FILE, + flatnode_file=str(self.config.get_path('FLATNODE_FILE')), tablespaces=dict(slim_data=self.config.TABLESPACE_OSM_DATA, slim_index=self.config.TABLESPACE_OSM_INDEX, main_data=self.config.TABLESPACE_PLACE_DATA, diff --git a/nominatim/clicmd/freeze.py b/nominatim/clicmd/freeze.py index 8a6c928e..41dd610e 100644 --- a/nominatim/clicmd/freeze.py +++ b/nominatim/clicmd/freeze.py @@ -31,6 +31,6 @@ class SetupFreeze: with connect(args.config.get_libpq_dsn()) as conn: freeze.drop_update_tables(conn) - freeze.drop_flatnode_file(args.config.FLATNODE_FILE) + freeze.drop_flatnode_file(str(args.config.get_path('FLATNODE_FILE'))) return 0 diff --git a/nominatim/clicmd/refresh.py b/nominatim/clicmd/refresh.py index e7d7d7ba..82a61f54 100644 --- a/nominatim/clicmd/refresh.py +++ b/nominatim/clicmd/refresh.py @@ -75,10 +75,9 @@ class UpdateRefresh: self._get_tokenizer(args.config).update_statistics() if args.address_levels: - cfg = Path(args.config.ADDRESS_LEVEL_CONFIG) - LOG.warning('Updating address levels from %s', cfg) + LOG.warning('Updating address levels') with connect(args.config.get_libpq_dsn()) as conn: - refresh.load_address_levels_from_file(conn, cfg) + refresh.load_address_levels_from_config(conn, args.config) if args.functions: LOG.warning('Create functions') diff --git a/nominatim/clicmd/replication.py b/nominatim/clicmd/replication.py index a22cef47..44eec5f1 100644 --- a/nominatim/clicmd/replication.py +++ b/nominatim/clicmd/replication.py @@ -42,14 +42,17 @@ class UpdateReplication: help='Initialise the update process') group.add_argument('--no-update-functions', dest='update_functions', action='store_false', - help=("Do not update the trigger function to " - "support differential updates.")) + help="Do not update the trigger function to " + "support differential updates (EXPERT)") group = parser.add_argument_group('Arguments for updates') group.add_argument('--check-for-updates', action='store_true', help='Check if new updates are available and exit') group.add_argument('--once', action='store_true', - help=("Download and apply updates only once. When " - "not set, updates are continuously applied")) + help="Download and apply updates only once. When " + "not set, updates are continuously applied") + group.add_argument('--catch-up', action='store_true', + help="Download and apply updates until no new " + "data is available on the server") group.add_argument('--no-index', action='store_false', dest='do_index', help=("Do not index the new data. Only usable " "together with --once")) @@ -92,28 +95,40 @@ class UpdateReplication: round_time(end - start_import), round_time(end - batchdate)) + + @staticmethod + def _compute_update_interval(args): + if args.catch_up: + return 0 + + update_interval = args.config.get_int('REPLICATION_UPDATE_INTERVAL') + # Sanity check to not overwhelm the Geofabrik servers. + if 'download.geofabrik.de' in args.config.REPLICATION_URL\ + and update_interval < 86400: + LOG.fatal("Update interval too low for download.geofabrik.de.\n" + "Please check install documentation " + "(https://nominatim.org/release-docs/latest/admin/Import-and-Update#" + "setting-up-the-update-process).") + raise UsageError("Invalid replication update interval setting.") + + return update_interval + + @staticmethod def _update(args): from ..tools import replication from ..indexer.indexer import Indexer from ..tokenizer import factory as tokenizer_factory + update_interval = UpdateReplication._compute_update_interval(args) + params = args.osm2pgsql_options(default_cache=2000, default_threads=1) params.update(base_url=args.config.REPLICATION_URL, - update_interval=args.config.get_int('REPLICATION_UPDATE_INTERVAL'), + update_interval=update_interval, import_file=args.project_dir / 'osmosischange.osc', max_diff_size=args.config.get_int('REPLICATION_MAX_DIFF'), indexed_only=not args.once) - # Sanity check to not overwhelm the Geofabrik servers. - if 'download.geofabrik.de' in params['base_url']\ - and params['update_interval'] < 86400: - LOG.fatal("Update interval too low for download.geofabrik.de.\n" - "Please check install documentation " - "(https://nominatim.org/release-docs/latest/admin/Import-and-Update#" - "setting-up-the-update-process).") - raise UsageError("Invalid replication update interval setting.") - if not args.once: if not args.do_index: LOG.fatal("Indexing cannot be disabled when running updates continuously.") @@ -135,8 +150,7 @@ class UpdateReplication: index_start = dt.datetime.now(dt.timezone.utc) indexer = Indexer(args.config.get_libpq_dsn(), tokenizer, args.threads or 1) - indexer.index_boundaries(0, 30) - indexer.index_by_rank(0, 30) + indexer.index_full(analyse=False) with connect(args.config.get_libpq_dsn()) as conn: status.set_indexed(conn, True) @@ -145,10 +159,15 @@ class UpdateReplication: else: index_start = None + if state is replication.UpdateState.NO_CHANGES and \ + args.catch_up or update_interval > 40*60: + while indexer.has_pending(): + indexer.index_full(analyse=False) + if LOG.isEnabledFor(logging.WARNING): UpdateReplication._report_update(batchdate, start, index_start) - if args.once: + if args.once or (args.catch_up and state is replication.UpdateState.NO_CHANGES): break if state is replication.UpdateState.NO_CHANGES: diff --git a/nominatim/clicmd/setup.py b/nominatim/clicmd/setup.py index 5e43d446..27847920 100644 --- a/nominatim/clicmd/setup.py +++ b/nominatim/clicmd/setup.py @@ -150,7 +150,7 @@ class SetupAll: refresh.create_functions(conn, config, False, False) LOG.warning('Create tables') database_import.create_tables(conn, config, reverse_only=reverse_only) - refresh.load_address_levels_from_file(conn, Path(config.ADDRESS_LEVEL_CONFIG)) + refresh.load_address_levels_from_config(conn, config) LOG.warning('Create functions (2nd pass)') refresh.create_functions(conn, config, False, False) LOG.warning('Create table triggers') diff --git a/nominatim/clicmd/special_phrases.py b/nominatim/clicmd/special_phrases.py index 626c0053..a4ef89a4 100644 --- a/nominatim/clicmd/special_phrases.py +++ b/nominatim/clicmd/special_phrases.py @@ -35,6 +35,13 @@ class ImportSpecialPhrases: An example file can be found in the Nominatim sources at 'test/testdb/full_en_phrases_test.csv'. + + The import can be further configured to ignore specific key/value pairs. + This is particularly useful when importing phrases from the wiki. The + default configuration excludes some very common tags like building=yes. + The configuration can be customized by putting a file `phrase-settings.json` + with custom rules into the project directory or by using the `--config` + option to point to another configuration file. """ @staticmethod def add_args(parser): @@ -45,6 +52,9 @@ class ImportSpecialPhrases: help='Import special phrases from a CSV file') group.add_argument('--no-replace', action='store_true', help='Keep the old phrases and only add the new ones') + group.add_argument('--config', action='store', + help='Configuration file for black/white listing ' + '(default: phrase-settings.json)') @staticmethod def run(args): @@ -72,5 +82,5 @@ class ImportSpecialPhrases: should_replace = not args.no_replace with connect(args.config.get_libpq_dsn()) as db_connection: SPImporter( - args.config, args.phplib_dir, db_connection, loader + args.config, db_connection, loader ).import_phrases(tokenizer, should_replace) diff --git a/nominatim/config.py b/nominatim/config.py index f316280b..bc3556f3 100644 --- a/nominatim/config.py +++ b/nominatim/config.py @@ -4,6 +4,7 @@ Nominatim configuration accessor. import logging import os from pathlib import Path +import json import yaml from dotenv import dotenv_values @@ -55,12 +56,6 @@ class Configuration: if project_dir is not None and (project_dir / '.env').is_file(): self._config.update(dotenv_values(str((project_dir / '.env').resolve()))) - # Add defaults for variables that are left empty to set the default. - # They may still be overwritten by environment variables. - if not self._config['NOMINATIM_ADDRESS_LEVEL_CONFIG']: - self._config['NOMINATIM_ADDRESS_LEVEL_CONFIG'] = \ - str(config_dir / 'address-levels.json') - class _LibDirs: pass @@ -98,6 +93,23 @@ class Configuration: raise UsageError("Configuration error.") from exp + def get_path(self, name): + """ Return the given configuration parameter as a Path. + If a relative path is configured, then the function converts this + into an absolute path with the project directory as root path. + If the configuration is unset, a falsy value is returned. + """ + value = self.__getattr__(name) + if value: + value = Path(value) + + if not value.is_absolute(): + value = self.project_dir / value + + value = value.resolve() + + return value + def get_libpq_dsn(self): """ Get configured database DSN converted into the key/value format understood by libpq and psycopg. @@ -128,7 +140,7 @@ class Configuration: if style in ('admin', 'street', 'address', 'full', 'extratags'): return self.config_dir / 'import-{}.style'.format(style) - return Path(style) + return self.find_config_file('', 'IMPORT_STYLE') def get_os_env(self): @@ -161,14 +173,19 @@ class Configuration: is loaded using this function and added at the position in the configuration tree. """ - assert Path(filename).suffix == '.yaml' + configfile = self.find_config_file(filename, config) + + if configfile.suffix in ('.yaml', '.yml'): + return self._load_from_yaml(configfile) - configfile = self._find_config_file(filename, config) + if configfile.suffix == '.json': + with configfile.open('r') as cfg: + return json.load(cfg) - return self._load_from_yaml(configfile) + raise UsageError(f"Config file '{configfile}' has unknown format.") - def _find_config_file(self, filename, config=None): + def find_config_file(self, filename, config=None): """ Resolve the location of a configuration file given a filename and an optional configuration option with the file name. Raises a UsageError when the file cannot be found or is not @@ -221,7 +238,7 @@ class Configuration: if Path(fname).is_absolute(): configfile = Path(fname) else: - configfile = self._find_config_file(loader.construct_scalar(node)) + configfile = self.find_config_file(loader.construct_scalar(node)) if configfile.suffix != '.yaml': LOG.fatal("Format error while reading '%s': only YAML format supported.", diff --git a/nominatim/indexer/indexer.py b/nominatim/indexer/indexer.py index d0cfb391..50bd232e 100644 --- a/nominatim/indexer/indexer.py +++ b/nominatim/indexer/indexer.py @@ -91,6 +91,17 @@ class Indexer: self.num_threads = num_threads + def has_pending(self): + """ Check if any data still needs indexing. + This function must only be used after the import has finished. + Otherwise it will be very expensive. + """ + with connect(self.dsn) as conn: + with conn.cursor() as cur: + cur.execute("SELECT 'a' FROM placex WHERE indexed_status > 0 LIMIT 1") + return cur.rowcount > 0 + + def index_full(self, analyse=True): """ Index the complete database. This will first index boundaries followed by all other objects. When `analyse` is True, then the diff --git a/nominatim/tools/refresh.py b/nominatim/tools/refresh.py index 00ae5dc9..0a72b02b 100644 --- a/nominatim/tools/refresh.py +++ b/nominatim/tools/refresh.py @@ -1,9 +1,9 @@ """ Functions for bringing auxiliary data in the database up-to-date. """ -import json import logging from textwrap import dedent +from pathlib import Path from psycopg2 import sql as pysql @@ -58,12 +58,15 @@ def load_address_levels(conn, table, levels): conn.commit() -def load_address_levels_from_file(conn, config_file): - """ Replace the `address_levels` table with the contents of the config - file. + +def load_address_levels_from_config(conn, config): + """ Replace the `address_levels` table with the content as + defined in the given configuration. Uses the parameter + NOMINATIM_ADDRESS_LEVEL_CONFIG to determine the location of the + configuration file. """ - with config_file.open('r') as fdesc: - load_address_levels(conn, 'address_levels', json.load(fdesc)) + cfg = config.load_sub_configuration('', config='ADDRESS_LEVEL_CONFIG') + load_address_levels(conn, 'address_levels', cfg) def create_functions(conn, config, enable_diff_updates=True, enable_debug=False): @@ -92,7 +95,7 @@ PHP_CONST_DEFS = ( ('Database_DSN', 'DATABASE_DSN', str), ('Default_Language', 'DEFAULT_LANGUAGE', str), ('Log_DB', 'LOG_DB', bool), - ('Log_File', 'LOG_FILE', str), + ('Log_File', 'LOG_FILE', Path), ('NoAccessControl', 'CORS_NOACCESSCONTROL', bool), ('Places_Max_ID_count', 'LOOKUP_MAX_COUNT', int), ('PolygonOutput_MaximumTypes', 'POLYGON_OUTPUT_MAX_TYPES', int), @@ -160,7 +163,12 @@ def _quote_php_variable(var_type, config, conf_name): if not getattr(config, conf_name): return 'false' - quoted = getattr(config, conf_name).replace("'", "\\'") + if var_type == Path: + value = str(config.get_path(conf_name)) + else: + value = getattr(config, conf_name) + + quoted = value.replace("'", "\\'") return f"'{quoted}'" diff --git a/nominatim/tools/special_phrases/sp_importer.py b/nominatim/tools/special_phrases/sp_importer.py index 791f4dc3..d9d126fa 100644 --- a/nominatim/tools/special_phrases/sp_importer.py +++ b/nominatim/tools/special_phrases/sp_importer.py @@ -8,15 +8,9 @@ valids anymore are removed. """ import logging -import os -from os.path import isfile -from pathlib import Path import re -import subprocess -import json from psycopg2.sql import Identifier, Literal, SQL -from nominatim.errors import UsageError from nominatim.tools.special_phrases.importer_statistics import SpecialPhrasesImporterStatistics LOG = logging.getLogger() @@ -33,9 +27,8 @@ class SPImporter(): Take a sp loader which load the phrases from an external source. """ - def __init__(self, config, phplib_dir, db_connection, sp_loader) -> None: + def __init__(self, config, db_connection, sp_loader) -> None: self.config = config - self.phplib_dir = phplib_dir self.db_connection = db_connection self.sp_loader = sp_loader self.statistics_handler = SpecialPhrasesImporterStatistics() @@ -101,13 +94,8 @@ class SPImporter(): """ Load white and black lists from phrases-settings.json. """ - settings_path = (self.config.config_dir / 'phrase-settings.json').resolve() + settings = self.config.load_sub_configuration('phrase-settings.json') - if self.config.PHRASE_CONFIG: - settings_path = self._convert_php_settings_if_needed(self.config.PHRASE_CONFIG) - - with settings_path.open("r") as json_settings: - settings = json.load(json_settings) return settings['blackList'], settings['whiteList'] def _check_sanity(self, phrase): @@ -255,29 +243,3 @@ class SPImporter(): for table in self.table_phrases_to_delete: self.statistics_handler.notify_one_table_deleted() db_cursor.drop_table(table) - - - def _convert_php_settings_if_needed(self, file_path): - """ - Convert php settings file of special phrases to json file if it is still in php format. - """ - if not isfile(file_path): - raise UsageError(str(file_path) + ' is not a valid file.') - - file, extension = os.path.splitext(file_path) - json_file_path = Path(file + '.json').resolve() - - if extension not in ('.php', '.json'): - raise UsageError('The custom NOMINATIM_PHRASE_CONFIG file has not a valid extension.') - - if extension == '.php' and not isfile(json_file_path): - try: - subprocess.run(['/usr/bin/env', 'php', '-Cq', - (self.phplib_dir / 'migration/PhraseSettingsToJson.php').resolve(), - file_path], check=True) - LOG.warning('special_phrase configuration file has been converted to json.') - except subprocess.CalledProcessError: - LOG.error('Error while converting %s to json.', file_path) - raise - - return json_file_path diff --git a/settings/env.defaults b/settings/env.defaults index 3fb128dc..00f5569a 100644 --- a/settings/env.defaults +++ b/settings/env.defaults @@ -89,14 +89,12 @@ NOMINATIM_TIGER_DATA_PATH= NOMINATIM_WIKIPEDIA_DATA_PATH= # Configuration file for special phrase import. -# When unset, the internal default settings from 'settings/phrase-settings.json' -# are used. +# OBSOLETE: use `nominatim special-phrases --config ` or simply put +# a custom phrase-settings.json into your project directory. NOMINATIM_PHRASE_CONFIG= # Configuration file for rank assignments. -# When unset, the internal default settings from 'settings/address-levels.json' -# are used. -NOMINATIM_ADDRESS_LEVEL_CONFIG= +NOMINATIM_ADDRESS_LEVEL_CONFIG=address-levels.json # Configuration file for OSM data import. # This may either be the name of one of an internal style or point diff --git a/settings/phrase-settings.json b/settings/phrase-settings.json index a097dca4..5d3ef6eb 100644 --- a/settings/phrase-settings.json +++ b/settings/phrase-settings.json @@ -6,7 +6,7 @@ "Also use this list to exclude an entire class from special phrases." ], "blackList": { - "bounday": [ + "boundary": [ "administrative" ], "place": [ diff --git a/test/python/test_cli.py b/test/python/test_cli.py index 7e6bf99e..707be23b 100644 --- a/test/python/test_cli.py +++ b/test/python/test_cli.py @@ -186,7 +186,7 @@ class TestCliWithDb: mock_func_factory(nominatim.tools.database_import, 'create_partition_tables'), mock_func_factory(nominatim.tools.database_import, 'create_search_indices'), mock_func_factory(nominatim.tools.country_info, 'create_country_names'), - mock_func_factory(nominatim.tools.refresh, 'load_address_levels_from_file'), + mock_func_factory(nominatim.tools.refresh, 'load_address_levels_from_config'), mock_func_factory(nominatim.tools.postcodes, 'update_postcodes'), mock_func_factory(nominatim.indexer.indexer.Indexer, 'index_full'), mock_func_factory(nominatim.tools.refresh, 'setup_website'), @@ -321,7 +321,7 @@ class TestCliWithDb: assert func.called == 1 @pytest.mark.parametrize("command,func", [ - ('address-levels', 'load_address_levels_from_file'), + ('address-levels', 'load_address_levels_from_config'), ('wiki-data', 'import_wikipedia_articles'), ('importance', 'recompute_importance'), ('website', 'setup_website'), diff --git a/test/python/test_cli_replication.py b/test/python/test_cli_replication.py index dcaeaf25..2dd35c0e 100644 --- a/test/python/test_cli_replication.py +++ b/test/python/test_cli_replication.py @@ -53,8 +53,7 @@ def init_status(temp_db_conn, status_table): @pytest.fixture def index_mock(monkeypatch, tokenizer_mock, init_status): mock = MockParamCapture() - monkeypatch.setattr(nominatim.indexer.indexer.Indexer, 'index_boundaries', mock) - monkeypatch.setattr(nominatim.indexer.indexer.Indexer, 'index_by_rank', mock) + monkeypatch.setattr(nominatim.indexer.indexer.Indexer, 'index_full', mock) return mock @@ -122,7 +121,7 @@ class TestCliReplication: with pytest.raises(IndexError): self.call_nominatim() - assert index_mock.called == 4 + assert index_mock.called == 2 def test_replication_update_continuous_no_change(self, monkeypatch, index_mock): @@ -137,6 +136,6 @@ class TestCliReplication: with pytest.raises(IndexError): self.call_nominatim() - assert index_mock.called == 2 + assert index_mock.called == 1 assert sleep_mock.called == 1 assert sleep_mock.last_args[0] == 60 diff --git a/test/python/test_config.py b/test/python/test_config.py index 8b5cb11b..a71324f9 100644 --- a/test/python/test_config.py +++ b/test/python/test_config.py @@ -1,6 +1,7 @@ """ Test for loading dotenv configuration. """ +from pathlib import Path import pytest from nominatim.config import Configuration @@ -166,6 +167,33 @@ def test_get_int_empty(make_config): config.get_int('DATABASE_MODULE_PATH') +def test_get_path_empty(make_config): + config = make_config() + + assert config.DATABASE_MODULE_PATH == '' + assert not config.get_path('DATABASE_MODULE_PATH') + + +def test_get_path_absolute(make_config, monkeypatch): + config = make_config() + + monkeypatch.setenv('NOMINATIM_FOOBAR', '/dont/care') + result = config.get_path('FOOBAR') + + assert isinstance(result, Path) + assert str(result) == '/dont/care' + + +def test_get_path_relative(make_config, monkeypatch, tmp_path): + config = make_config(tmp_path) + + monkeypatch.setenv('NOMINATIM_FOOBAR', 'an/oyster') + result = config.get_path('FOOBAR') + + assert isinstance(result, Path) + assert str(result) == str(tmp_path / 'an/oyster') + + def test_get_import_style_intern(make_config, src_dir, monkeypatch): config = make_config() @@ -176,13 +204,24 @@ def test_get_import_style_intern(make_config, src_dir, monkeypatch): assert config.get_import_style_file() == expected -@pytest.mark.parametrize("value", ['custom', '/foo/bar.stye']) -def test_get_import_style_extern(make_config, monkeypatch, value): +def test_get_import_style_extern_relative(make_config_path, monkeypatch): + config = make_config_path() + (config.project_dir / 'custom.style').write_text('x') + + monkeypatch.setenv('NOMINATIM_IMPORT_STYLE', 'custom.style') + + assert str(config.get_import_style_file()) == str(config.project_dir / 'custom.style') + + +def test_get_import_style_extern_absolute(make_config, tmp_path, monkeypatch): config = make_config() + cfgfile = tmp_path / 'test.style' + + cfgfile.write_text('x') - monkeypatch.setenv('NOMINATIM_IMPORT_STYLE', value) + monkeypatch.setenv('NOMINATIM_IMPORT_STYLE', str(cfgfile)) - assert str(config.get_import_style_file()) == value + assert str(config.get_import_style_file()) == str(cfgfile) def test_load_subconf_from_project_dir(make_config_path): diff --git a/test/python/test_tools_import_special_phrases.py b/test/python/test_tools_import_special_phrases.py index f0a34b08..7c3d0646 100644 --- a/test/python/test_tools_import_special_phrases.py +++ b/test/python/test_tools_import_special_phrases.py @@ -17,30 +17,12 @@ def testfile_dir(src_dir): @pytest.fixture -def sp_importer(temp_db_conn, def_config, temp_phplib_dir_with_migration): +def sp_importer(temp_db_conn, def_config): """ Return an instance of SPImporter. """ loader = SPWikiLoader(def_config, ['en']) - return SPImporter(def_config, temp_phplib_dir_with_migration, temp_db_conn, loader) - - -@pytest.fixture -def temp_phplib_dir_with_migration(src_dir, tmp_path): - """ - Return temporary phpdir with migration subdirectory and - PhraseSettingsToJson.php script inside. - """ - migration_file = (src_dir / 'lib-php' / 'migration' / 'PhraseSettingsToJson.php').resolve() - - phpdir = tmp_path / 'tempphp' - phpdir.mkdir() - - (phpdir / 'migration').mkdir() - migration_dest_path = (phpdir / 'migration' / 'PhraseSettingsToJson.php').resolve() - copyfile(str(migration_file), str(migration_dest_path)) - - return phpdir + return SPImporter(def_config, temp_db_conn, loader) @pytest.fixture @@ -90,49 +72,6 @@ def test_load_white_and_black_lists(sp_importer): assert isinstance(black_list, dict) and isinstance(white_list, dict) -def test_convert_php_settings(sp_importer, testfile_dir, tmp_path): - """ - Test that _convert_php_settings_if_needed() convert the given - php file to a json file. - """ - php_file = (testfile_dir / 'phrase_settings.php').resolve() - - temp_settings = (tmp_path / 'phrase_settings.php').resolve() - copyfile(php_file, temp_settings) - sp_importer._convert_php_settings_if_needed(temp_settings) - - assert (tmp_path / 'phrase_settings.json').is_file() - -def test_convert_settings_wrong_file(sp_importer): - """ - Test that _convert_php_settings_if_needed() raise an exception - if the given file is not a valid file. - """ - with pytest.raises(UsageError, match='random_file is not a valid file.'): - sp_importer._convert_php_settings_if_needed('random_file') - -def test_convert_settings_json_already_exist(sp_importer, testfile_dir): - """ - Test that if we give to '_convert_php_settings_if_needed' a php file path - and that a the corresponding json file already exists, it is returned. - """ - php_file = (testfile_dir / 'phrase_settings.php').resolve() - json_file = (testfile_dir / 'phrase_settings.json').resolve() - - returned = sp_importer._convert_php_settings_if_needed(php_file) - - assert returned == json_file - -def test_convert_settings_giving_json(sp_importer, testfile_dir): - """ - Test that if we give to '_convert_php_settings_if_needed' a json file path - the same path is directly returned - """ - json_file = (testfile_dir / 'phrase_settings.json').resolve() - - returned = sp_importer._convert_php_settings_if_needed(json_file) - - assert returned == json_file def test_create_place_classtype_indexes(temp_db_with_extensions, temp_db_conn, table_factory, sp_importer): diff --git a/test/python/test_tools_refresh_address_levels.py b/test/python/test_tools_refresh_address_levels.py index 2821222c..2c4ee24d 100644 --- a/test/python/test_tools_refresh_address_levels.py +++ b/test/python/test_tools_refresh_address_levels.py @@ -6,28 +6,31 @@ from pathlib import Path import pytest -from nominatim.tools.refresh import load_address_levels, load_address_levels_from_file +from nominatim.tools.refresh import load_address_levels, load_address_levels_from_config def test_load_ranks_def_config(temp_db_conn, temp_db_cursor, def_config): - load_address_levels_from_file(temp_db_conn, Path(def_config.ADDRESS_LEVEL_CONFIG)) + load_address_levels_from_config(temp_db_conn, def_config) assert temp_db_cursor.table_rows('address_levels') > 0 -def test_load_ranks_from_file(temp_db_conn, temp_db_cursor, tmp_path): - test_file = tmp_path / 'test_levels.json' +def test_load_ranks_from_project_dir(def_config, temp_db_conn, temp_db_cursor, + tmp_path): + test_file = tmp_path / 'address-levels.json' test_file.write_text('[{"tags":{"place":{"sea":2}}}]') + def_config.project_dir = tmp_path - load_address_levels_from_file(temp_db_conn, test_file) + load_address_levels_from_config(temp_db_conn, def_config) - assert temp_db_cursor.table_rows('address_levels') > 0 + assert temp_db_cursor.table_rows('address_levels') == 1 -def test_load_ranks_from_broken_file(temp_db_conn, tmp_path): - test_file = tmp_path / 'test_levels.json' +def test_load_ranks_from_broken_file(def_config, temp_db_conn, tmp_path): + test_file = tmp_path / 'address-levels.json' test_file.write_text('[{"tags":"place":{"sea":2}}}]') + def_config.project_dir = tmp_path with pytest.raises(json.decoder.JSONDecodeError): - load_address_levels_from_file(temp_db_conn, test_file) + load_address_levels_from_config(temp_db_conn, def_config) def test_load_ranks_country(temp_db_conn, temp_db_cursor):