remove documentation around legacy tokenizer

author Sarah Hoffmann <lonvia@denofr.de>

Sat, 21 Sep 2024 16:27:01 +0000 (18:27 +0200)

committer Sarah Hoffmann <lonvia@denofr.de>

Sat, 21 Sep 2024 16:27:01 +0000 (18:27 +0200)
author Sarah Hoffmann <lonvia@denofr.de>
Sat, 21 Sep 2024 16:27:01 +0000 (18:27 +0200)
committer Sarah Hoffmann <lonvia@denofr.de>
Sat, 21 Sep 2024 16:27:01 +0000 (18:27 +0200)
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md

index 757c52b7c2f8916382bfced2ff2b639d9fdfe4d1..a78bbfb3e8a8f9fc641da3bcd5be65d762e85570 100644 (file)
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -61,8 +61,7 @@ pylint3 --extension-pkg-whitelist=osmium nominatim
  Before submitting a pull request make sure that the tests pass:
  
  ```
-  cd build
-  make test
+  make tests
  ```
  
  ## Releases
diff --git a/docs/admin/Advanced-Installations.md b/docs/admin/Advanced-Installations.md

index ac8da274c69b7fd7b0a3a13c7b01b2e233202d2e..de3c5876984ce85e5ea48208b337cf4fa70898c4 100644 (file)
--- a/docs/admin/Advanced-Installations.md
+++ b/docs/admin/Advanced-Installations.md
@@ -131,76 +131,13 @@ script ([Geofabrik](https://download.geofabrik.de)) provides daily updates.
  
  ## Using an external PostgreSQL database
  
-You can install Nominatim using a database that runs on a different server when
-you have physical access to the file system on the other server. Nominatim
-uses a custom normalization library that needs to be made accessible to the
-PostgreSQL server. This section explains how to set up the normalization
-library.
-
-!!! note
-    The external module is only needed when using the legacy tokenizer.
-    If you have chosen the ICU tokenizer, then you can ignore this section
-    and follow the standard import documentation.
-
-### Option 1: Compiling the library on the database server
-
-The most sure way to get a working library is to compile it on the database
-server. From the prerequisites you need at least cmake, gcc and the
-PostgreSQL server package.
-
-Clone or unpack the Nominatim source code, enter the source directory and
-create and enter a build directory.
-
-```sh
-cd Nominatim
-mkdir build
-cd build
-```
-
-Now configure cmake to only build the PostgreSQL module and build it:
-
-```
-cmake -DBUILD_IMPORTER=off -DBUILD_API=off -DBUILD_TESTS=off -DBUILD_DOCS=off -DBUILD_OSM2PGSQL=off ..
-make
-```
-
-When done, you find the normalization library in `build/module/nominatim.so`.
-Copy it to a place where it is readable and executable by the PostgreSQL server
-process.
-
-### Option 2: Compiling the library on the import machine
-
-You can also compile the normalization library on the machine from where you
-run the import.
-
-!!! important
-    You can only do this when the database server and the import machine have
-    the same architecture and run the same version of Linux. Otherwise there is
-    no guarantee that the compiled library is compatible with the PostgreSQL
-    server running on the database server.
-
-Make sure that the PostgreSQL server package is installed on the machine
-**with the same version as on the database server**. You do not need to install
-the PostgreSQL server itself.
-
-Download and compile Nominatim as per standard instructions. Once done, you find
-the normalization library in `build/module/nominatim.so`. Copy the file to
-the database server at a location where it is readable and executable by the
-PostgreSQL server process.
-
-### Running the import
-
-On the client side you now need to configure the import to point to the
-correct location of the library **on the database server**. Add the following
-line to your your `.env` file:
-
-```
-NOMINATIM_DATABASE_MODULE_PATH="<directory on the database server where nominatim.so resides>"
-```
-
-Now change the `NOMINATIM_DATABASE_DSN` to point to your remote server and continue
-to follow the [standard instructions for importing](Import.md).
+You can install Nominatim using a database that runs on a different server.
+Simply point the configuration variable `NOMINATIM_DATABASE_DSN` to the
+server and follow the standard import documentation.
  
+The import will be faster, if the import is run directly from the database
+machine. You can easily switch to a different machine for the query frontend
+after the import.
  
  ## Moving the database to another machine
  
@@ -225,20 +162,9 @@ target machine.
      data updates but the resulting database is only about a third of the size
      of a full database.
  
-Next install Nominatim on the target machine by following the standard installation
-instructions. Again, make sure to use the same version as the source machine.
+Next install nominatim-api on the target machine by following the standard
+installation instructions. Again, make sure to use the same version as the
+source machine.
  
  Create a project directory on your destination machine and set up the `.env`
-file to match the configuration on the source machine. Finally run
-
-    nominatim refresh --website
-
-to make sure that the local installation of Nominatim will be used.
-
-If you are using the legacy tokenizer you might also have to switch to the
-PostgreSQL module that was compiled on your target machine. If you get errors
-that PostgreSQL cannot find or access `nominatim.so` then rerun
-
-    nominatim refresh --functions
-
-on the target machine to update the the location of the module.
+file to match the configuration on the source machine. That's all.
diff --git a/docs/admin/Installation.md b/docs/admin/Installation.md

index 38c4d6017d73ca543d95dd361c2000d1700ca860..78062908c9ca230179023a960008a6f04fd9deb1 100644 (file)
--- a/docs/admin/Installation.md
+++ b/docs/admin/Installation.md
@@ -178,18 +178,6 @@ make
  sudo make install
  ```
  
-!!! warning
-    The default installation no longer compiles the PostgreSQL module that
-    is needed for the legacy tokenizer from older Nominatim versions. If you
-    are upgrading an older database or want to run the
-    [legacy tokenizer](../customize/Tokenizers.md#legacy-tokenizer) for
-    some other reason, you need to enable the PostgreSQL module via
-    cmake: `cmake -DBUILD_MODULE=on ../Nominatim`. To compile the module
-    you need to have the server development headers for PostgreSQL installed.
-    On Ubuntu/Debian run: `sudo apt install postgresql-server-dev-<postgresql version>`
-    The legacy tokenizer is deprecated and will be removed in Nominatim 5.0
-
-
  Nominatim installs itself into `/usr/local` per default. To choose a different
  installation directory add `-DCMAKE_INSTALL_PREFIX=<install root>` to the
  cmake command. Make sure that the `bin` directory is available in your path
diff --git a/docs/customize/Settings.md b/docs/customize/Settings.md

index b35fce3dfc5238e3f8800f0f02e92cd5e64f508d..b00d04cf6386bb3aa41b2d6f0409e0e043128239 100644 (file)
--- a/docs/customize/Settings.md
+++ b/docs/customize/Settings.md
@@ -64,26 +64,6 @@ Nominatim grants minimal rights to this user to all tables that are needed
  for running geocoding queries.
  
  
-#### NOMINATIM_DATABASE_MODULE_PATH
-
-| Summary            |                                                     |
-| --------------     | --------------------------------------------------- |
-| **Description:**   | Directory where to find the PostgreSQL server module |
-| **Format:**        | path |
-| **Default:**       | _empty_ (use `<project_directory>/module`) |
-| **After Changes:** | run `nominatim refresh --functions` |
-| **Comment:**       | Legacy tokenizer only |
-
-Defines the directory in which the PostgreSQL server module `nominatim.so`
-is stored. The directory and module must be accessible by the PostgreSQL
-server.
-
-For information on how to use this setting when working with external databases,
-see [Advanced Installations](../admin/Advanced-Installations.md).
-
-The option is only used by the Legacy tokenizer and ignored otherwise.
-
-
  #### NOMINATIM_TOKENIZER
  
  | Summary            |                                                     |
@@ -114,20 +94,6 @@ on the file format.
  If a relative path is given, then the file is searched first relative to the
  project directory and then in the global settings directory.
  
-#### NOMINATIM_MAX_WORD_FREQUENCY
-
-| Summary            |                                                     |
-| --------------     | --------------------------------------------------- |
-| **Description:**   | Number of occurrences before a word is considered frequent |
-| **Format:**        | int |
-| **Default:**       | 50000 |
-| **After Changes:** | cannot be changed after import |
-| **Comment:**       | Legacy tokenizer only |
-
-The word frequency count is used by the Legacy tokenizer to automatically
-identify _stop words_. Any partial term that occurs more often then what
-is defined in this setting, is effectively ignored during search.
-
  
  #### NOMINATIM_LIMIT_REINDEXING
  
@@ -162,25 +128,6 @@ codes, to restrict import to a subset of languages.
  Currently only affects the initial import of country names and special phrases.
  
  
-#### NOMINATIM_TERM_NORMALIZATION
-
-| Summary            |                                                     |
-| --------------     | --------------------------------------------------- |
-| **Description:**   | Rules for normalizing terms for comparisons |
-| **Format:**        | string: semicolon-separated list of ICU rules |
-| **Default:**       | :: NFD (); [[:Nonspacing Mark:] [:Cf:]] >;  :: lower (); [[:Punctuation:][:Space:]]+ > ' '; :: NFC (); |
-| **Comment:**       | Legacy tokenizer only |
-
-[Special phrases](Special-Phrases.md) have stricter matching requirements than
-normal search terms. They must appear exactly in the query after this term
-normalization has been applied.
-
-Only has an effect on the Legacy tokenizer. For the ICU tokenizer the rules
-defined in the
-[normalization section](Tokenizers.md#normalization-and-transliteration)
-will be used.
-
-
  #### NOMINATIM_USE_US_TIGER_DATA
  
  | Summary            |                                                     |
diff --git a/docs/customize/Tokenizers.md b/docs/customize/Tokenizers.md

index 49e86a5009289cea7f12aea36202abbda1548737..30be170edea91babfa271d47364d123d84f9fdf0 100644 (file)
--- a/docs/customize/Tokenizers.md
+++ b/docs/customize/Tokenizers.md
@@ -15,53 +15,6 @@ they can be configured.
      chosen tokenizer is very limited as well. See the comments in each tokenizer
      section.
  
-## Legacy tokenizer
-
-!!! danger
-    The Legacy tokenizer is deprecated and will be removed in Nominatim 5.0.
-    If you still use a database with the legacy tokenizer, you must reimport
-    it using the ICU tokenizer below.
-
-The legacy tokenizer implements the analysis algorithms of older Nominatim
-versions. It uses a special Postgresql module to normalize names and queries.
-This tokenizer is automatically installed and used when upgrading an older
-database. It should not be used for new installations anymore.
-
-### Compiling the PostgreSQL module
-
-The tokeinzer needs a special C module for PostgreSQL which is not compiled
-by default. If you need the legacy tokenizer, compile Nominatim as follows:
-
-```
-mkdir build
-cd build
-cmake -DBUILD_MODULE=on
-make
-```
-
-### Enabling the tokenizer
-
-To enable the tokenizer add the following line to your project configuration:
-
-```
-NOMINATIM_TOKENIZER=legacy
-```
-
-The Postgresql module for the tokenizer is available in the `module` directory
-and also installed with the remainder of the software under
-`lib/nominatim/module/nominatim.so`. You can specify a custom location for
-the module with
-
-```
-NOMINATIM_DATABASE_MODULE_PATH=<path to directory where nominatim.so resides>
-```
-
-This is in particular useful when the database runs on a different server.
-See [Advanced installations](../admin/Advanced-Installations.md#using-an-external-postgresql-database) for details.
-
-There are no other configuration options for the legacy tokenizer. All
-normalization functions are hard-coded.
-
  ## ICU tokenizer
  
  The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to
diff --git a/docs/develop/Testing.md b/docs/develop/Testing.md

index d57ab3199acc003d29a5be44a40ea4df8eb32ba6..12673d403aa5f48297ce6999741221dca1bd6d3e 100644 (file)
--- a/docs/develop/Testing.md
+++ b/docs/develop/Testing.md
@@ -72,8 +72,6 @@ The tests can be configured with a set of environment variables (`behave -D key=
   * `DB_PORT` - (optional) port of database on host
   * `DB_USER` - (optional) username of database login
   * `DB_PASS` - (optional) password for database login
- * `SERVER_MODULE_PATH` - (optional) path on the Postgres server to Nominatim
-                          module shared library file (only needed for legacy tokenizer)
   * `REMOVE_TEMPLATE` - if true, the template and API database will not be reused
                         during the next run. Reusing the base templates speeds
                         up tests considerably but might lead to outdated errors
diff --git a/settings/env.defaults b/settings/env.defaults

index d3952af0239b0aee21a0d0594be186f4da9f6752..b8c666677ff04616252e0ecf38ba3505ce3a668b 100644 (file)
--- a/settings/env.defaults
+++ b/settings/env.defaults
@@ -18,12 +18,6 @@ NOMINATIM_DATABASE_WEBUSER="www-data"
  # Currently available tokenizers: icu, legacy
  NOMINATIM_TOKENIZER="icu"
  
-# Number of occurrences of a word before it is considered frequent.
-# Similar to the concept of stop words. Frequent partial words get ignored
-# or handled differently during search.
-# Changing this value requires a reimport.
-NOMINATIM_MAX_WORD_FREQUENCY=50000
-
  # If true, admin level changes on places with many contained children are blocked.
  NOMINATIM_LIMIT_REINDEXING=yes
  
@@ -34,12 +28,6 @@ NOMINATIM_LIMIT_REINDEXING=yes
  # Currently only affects the initial import of country names and special phrases.
  NOMINATIM_LANGUAGES=
  
-# Rules for normalizing terms for comparisons.
-# The default is to remove accents and punctuation and to lower-case the
-# term. Spaces are kept but collapsed to one standard space.
-# Changing this value requires a reimport.
-NOMINATIM_TERM_NORMALIZATION=":: NFD (); [[:Nonspacing Mark:] [:Cf:]] >;  :: lower (); [[:Punctuation:][:Space:]]+ > ' '; :: NFC ();"
-
  # Configuration file for the tokenizer.
  # The content depends on the tokenizer used. If left empty the default settings
  # for the chosen tokenizer will be used. The configuration can only be set
author	Sarah Hoffmann <lonvia@denofr.de>
	Sat, 21 Sep 2024 16:27:01 +0000 (18:27 +0200)
committer	Sarah Hoffmann <lonvia@denofr.de>
	Sat, 21 Sep 2024 16:27:01 +0000 (18:27 +0200)
CONTRIBUTING.md		patch \| blob \| history
docs/admin/Advanced-Installations.md		patch \| blob \| history
docs/admin/Installation.md		patch \| blob \| history
docs/customize/Settings.md		patch \| blob \| history
docs/customize/Tokenizers.md		patch \| blob \| history
docs/develop/Testing.md		patch \| blob \| history
settings/env.defaults		patch \| blob \| history