From 4825a0bda3e2b5d6a9c153b7cd0b8da2959cbc81 Mon Sep 17 00:00:00 2001 From: Sarah Hoffmann Date: Sat, 21 Sep 2024 18:27:01 +0200 Subject: [PATCH] remove documentation around legacy tokenizer --- CONTRIBUTING.md | 3 +- docs/admin/Advanced-Installations.md | 94 +++------------------------- docs/admin/Installation.md | 12 ---- docs/customize/Settings.md | 53 ---------------- docs/customize/Tokenizers.md | 47 -------------- docs/develop/Testing.md | 2 - settings/env.defaults | 12 ---- 7 files changed, 11 insertions(+), 212 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 757c52b7..a78bbfb3 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -61,8 +61,7 @@ pylint3 --extension-pkg-whitelist=osmium nominatim Before submitting a pull request make sure that the tests pass: ``` - cd build - make test + make tests ``` ## Releases diff --git a/docs/admin/Advanced-Installations.md b/docs/admin/Advanced-Installations.md index ac8da274..de3c5876 100644 --- a/docs/admin/Advanced-Installations.md +++ b/docs/admin/Advanced-Installations.md @@ -131,76 +131,13 @@ script ([Geofabrik](https://download.geofabrik.de)) provides daily updates. ## Using an external PostgreSQL database -You can install Nominatim using a database that runs on a different server when -you have physical access to the file system on the other server. Nominatim -uses a custom normalization library that needs to be made accessible to the -PostgreSQL server. This section explains how to set up the normalization -library. - -!!! note - The external module is only needed when using the legacy tokenizer. - If you have chosen the ICU tokenizer, then you can ignore this section - and follow the standard import documentation. - -### Option 1: Compiling the library on the database server - -The most sure way to get a working library is to compile it on the database -server. From the prerequisites you need at least cmake, gcc and the -PostgreSQL server package. - -Clone or unpack the Nominatim source code, enter the source directory and -create and enter a build directory. - -```sh -cd Nominatim -mkdir build -cd build -``` - -Now configure cmake to only build the PostgreSQL module and build it: - -``` -cmake -DBUILD_IMPORTER=off -DBUILD_API=off -DBUILD_TESTS=off -DBUILD_DOCS=off -DBUILD_OSM2PGSQL=off .. -make -``` - -When done, you find the normalization library in `build/module/nominatim.so`. -Copy it to a place where it is readable and executable by the PostgreSQL server -process. - -### Option 2: Compiling the library on the import machine - -You can also compile the normalization library on the machine from where you -run the import. - -!!! important - You can only do this when the database server and the import machine have - the same architecture and run the same version of Linux. Otherwise there is - no guarantee that the compiled library is compatible with the PostgreSQL - server running on the database server. - -Make sure that the PostgreSQL server package is installed on the machine -**with the same version as on the database server**. You do not need to install -the PostgreSQL server itself. - -Download and compile Nominatim as per standard instructions. Once done, you find -the normalization library in `build/module/nominatim.so`. Copy the file to -the database server at a location where it is readable and executable by the -PostgreSQL server process. - -### Running the import - -On the client side you now need to configure the import to point to the -correct location of the library **on the database server**. Add the following -line to your your `.env` file: - -``` -NOMINATIM_DATABASE_MODULE_PATH="" -``` - -Now change the `NOMINATIM_DATABASE_DSN` to point to your remote server and continue -to follow the [standard instructions for importing](Import.md). +You can install Nominatim using a database that runs on a different server. +Simply point the configuration variable `NOMINATIM_DATABASE_DSN` to the +server and follow the standard import documentation. +The import will be faster, if the import is run directly from the database +machine. You can easily switch to a different machine for the query frontend +after the import. ## Moving the database to another machine @@ -225,20 +162,9 @@ target machine. data updates but the resulting database is only about a third of the size of a full database. -Next install Nominatim on the target machine by following the standard installation -instructions. Again, make sure to use the same version as the source machine. +Next install nominatim-api on the target machine by following the standard +installation instructions. Again, make sure to use the same version as the +source machine. Create a project directory on your destination machine and set up the `.env` -file to match the configuration on the source machine. Finally run - - nominatim refresh --website - -to make sure that the local installation of Nominatim will be used. - -If you are using the legacy tokenizer you might also have to switch to the -PostgreSQL module that was compiled on your target machine. If you get errors -that PostgreSQL cannot find or access `nominatim.so` then rerun - - nominatim refresh --functions - -on the target machine to update the the location of the module. +file to match the configuration on the source machine. That's all. diff --git a/docs/admin/Installation.md b/docs/admin/Installation.md index 38c4d601..78062908 100644 --- a/docs/admin/Installation.md +++ b/docs/admin/Installation.md @@ -178,18 +178,6 @@ make sudo make install ``` -!!! warning - The default installation no longer compiles the PostgreSQL module that - is needed for the legacy tokenizer from older Nominatim versions. If you - are upgrading an older database or want to run the - [legacy tokenizer](../customize/Tokenizers.md#legacy-tokenizer) for - some other reason, you need to enable the PostgreSQL module via - cmake: `cmake -DBUILD_MODULE=on ../Nominatim`. To compile the module - you need to have the server development headers for PostgreSQL installed. - On Ubuntu/Debian run: `sudo apt install postgresql-server-dev-` - The legacy tokenizer is deprecated and will be removed in Nominatim 5.0 - - Nominatim installs itself into `/usr/local` per default. To choose a different installation directory add `-DCMAKE_INSTALL_PREFIX=` to the cmake command. Make sure that the `bin` directory is available in your path diff --git a/docs/customize/Settings.md b/docs/customize/Settings.md index b35fce3d..b00d04cf 100644 --- a/docs/customize/Settings.md +++ b/docs/customize/Settings.md @@ -64,26 +64,6 @@ Nominatim grants minimal rights to this user to all tables that are needed for running geocoding queries. -#### NOMINATIM_DATABASE_MODULE_PATH - -| Summary | | -| -------------- | --------------------------------------------------- | -| **Description:** | Directory where to find the PostgreSQL server module | -| **Format:** | path | -| **Default:** | _empty_ (use `/module`) | -| **After Changes:** | run `nominatim refresh --functions` | -| **Comment:** | Legacy tokenizer only | - -Defines the directory in which the PostgreSQL server module `nominatim.so` -is stored. The directory and module must be accessible by the PostgreSQL -server. - -For information on how to use this setting when working with external databases, -see [Advanced Installations](../admin/Advanced-Installations.md). - -The option is only used by the Legacy tokenizer and ignored otherwise. - - #### NOMINATIM_TOKENIZER | Summary | | @@ -114,20 +94,6 @@ on the file format. If a relative path is given, then the file is searched first relative to the project directory and then in the global settings directory. -#### NOMINATIM_MAX_WORD_FREQUENCY - -| Summary | | -| -------------- | --------------------------------------------------- | -| **Description:** | Number of occurrences before a word is considered frequent | -| **Format:** | int | -| **Default:** | 50000 | -| **After Changes:** | cannot be changed after import | -| **Comment:** | Legacy tokenizer only | - -The word frequency count is used by the Legacy tokenizer to automatically -identify _stop words_. Any partial term that occurs more often then what -is defined in this setting, is effectively ignored during search. - #### NOMINATIM_LIMIT_REINDEXING @@ -162,25 +128,6 @@ codes, to restrict import to a subset of languages. Currently only affects the initial import of country names and special phrases. -#### NOMINATIM_TERM_NORMALIZATION - -| Summary | | -| -------------- | --------------------------------------------------- | -| **Description:** | Rules for normalizing terms for comparisons | -| **Format:** | string: semicolon-separated list of ICU rules | -| **Default:** | :: NFD (); [[:Nonspacing Mark:] [:Cf:]] >; :: lower (); [[:Punctuation:][:Space:]]+ > ' '; :: NFC (); | -| **Comment:** | Legacy tokenizer only | - -[Special phrases](Special-Phrases.md) have stricter matching requirements than -normal search terms. They must appear exactly in the query after this term -normalization has been applied. - -Only has an effect on the Legacy tokenizer. For the ICU tokenizer the rules -defined in the -[normalization section](Tokenizers.md#normalization-and-transliteration) -will be used. - - #### NOMINATIM_USE_US_TIGER_DATA | Summary | | diff --git a/docs/customize/Tokenizers.md b/docs/customize/Tokenizers.md index 49e86a50..30be170e 100644 --- a/docs/customize/Tokenizers.md +++ b/docs/customize/Tokenizers.md @@ -15,53 +15,6 @@ they can be configured. chosen tokenizer is very limited as well. See the comments in each tokenizer section. -## Legacy tokenizer - -!!! danger - The Legacy tokenizer is deprecated and will be removed in Nominatim 5.0. - If you still use a database with the legacy tokenizer, you must reimport - it using the ICU tokenizer below. - -The legacy tokenizer implements the analysis algorithms of older Nominatim -versions. It uses a special Postgresql module to normalize names and queries. -This tokenizer is automatically installed and used when upgrading an older -database. It should not be used for new installations anymore. - -### Compiling the PostgreSQL module - -The tokeinzer needs a special C module for PostgreSQL which is not compiled -by default. If you need the legacy tokenizer, compile Nominatim as follows: - -``` -mkdir build -cd build -cmake -DBUILD_MODULE=on -make -``` - -### Enabling the tokenizer - -To enable the tokenizer add the following line to your project configuration: - -``` -NOMINATIM_TOKENIZER=legacy -``` - -The Postgresql module for the tokenizer is available in the `module` directory -and also installed with the remainder of the software under -`lib/nominatim/module/nominatim.so`. You can specify a custom location for -the module with - -``` -NOMINATIM_DATABASE_MODULE_PATH= -``` - -This is in particular useful when the database runs on a different server. -See [Advanced installations](../admin/Advanced-Installations.md#using-an-external-postgresql-database) for details. - -There are no other configuration options for the legacy tokenizer. All -normalization functions are hard-coded. - ## ICU tokenizer The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to diff --git a/docs/develop/Testing.md b/docs/develop/Testing.md index d57ab319..12673d40 100644 --- a/docs/develop/Testing.md +++ b/docs/develop/Testing.md @@ -72,8 +72,6 @@ The tests can be configured with a set of environment variables (`behave -D key= * `DB_PORT` - (optional) port of database on host * `DB_USER` - (optional) username of database login * `DB_PASS` - (optional) password for database login - * `SERVER_MODULE_PATH` - (optional) path on the Postgres server to Nominatim - module shared library file (only needed for legacy tokenizer) * `REMOVE_TEMPLATE` - if true, the template and API database will not be reused during the next run. Reusing the base templates speeds up tests considerably but might lead to outdated errors diff --git a/settings/env.defaults b/settings/env.defaults index d3952af0..b8c66667 100644 --- a/settings/env.defaults +++ b/settings/env.defaults @@ -18,12 +18,6 @@ NOMINATIM_DATABASE_WEBUSER="www-data" # Currently available tokenizers: icu, legacy NOMINATIM_TOKENIZER="icu" -# Number of occurrences of a word before it is considered frequent. -# Similar to the concept of stop words. Frequent partial words get ignored -# or handled differently during search. -# Changing this value requires a reimport. -NOMINATIM_MAX_WORD_FREQUENCY=50000 - # If true, admin level changes on places with many contained children are blocked. NOMINATIM_LIMIT_REINDEXING=yes @@ -34,12 +28,6 @@ NOMINATIM_LIMIT_REINDEXING=yes # Currently only affects the initial import of country names and special phrases. NOMINATIM_LANGUAGES= -# Rules for normalizing terms for comparisons. -# The default is to remove accents and punctuation and to lower-case the -# term. Spaces are kept but collapsed to one standard space. -# Changing this value requires a reimport. -NOMINATIM_TERM_NORMALIZATION=":: NFD (); [[:Nonspacing Mark:] [:Cf:]] >; :: lower (); [[:Punctuation:][:Space:]]+ > ' '; :: NFC ();" - # Configuration file for the tokenizer. # The content depends on the tokenizer used. If left empty the default settings # for the chosen tokenizer will be used. The configuration can only be set -- 2.39.5