From 4825a0bda3e2b5d6a9c153b7cd0b8da2959cbc81 Mon Sep 17 00:00:00 2001
From: Sarah Hoffmann <lonvia@denofr.de>
Date: Sat, 21 Sep 2024 18:27:01 +0200
Subject: [PATCH] remove documentation around legacy tokenizer

---
 CONTRIBUTING.md                      |  3 +-
 docs/admin/Advanced-Installations.md | 94 +++-------------------------
 docs/admin/Installation.md           | 12 ----
 docs/customize/Settings.md           | 53 ----------------
 docs/customize/Tokenizers.md         | 47 --------------
 docs/develop/Testing.md              |  2 -
 settings/env.defaults                | 12 ----
 7 files changed, 11 insertions(+), 212 deletions(-)
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 757c52b7..a78bbfb3 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -61,8 +61,7 @@ pylint3 --extension-pkg-whitelist=osmium nominatim
 Before submitting a pull request make sure that the tests pass:
 
 ```
-  cd build
-  make test
+  make tests
 ```
 
 ## Releases
diff --git a/docs/admin/Advanced-Installations.md b/docs/admin/Advanced-Installations.md
index ac8da274..de3c5876 100644
--- a/docs/admin/Advanced-Installations.md
+++ b/docs/admin/Advanced-Installations.md
@@ -131,76 +131,13 @@ script ([Geofabrik](https://download.geofabrik.de)) provides daily updates.
 
 ## Using an external PostgreSQL database
 
-You can install Nominatim using a database that runs on a different server when
-you have physical access to the file system on the other server. Nominatim
-uses a custom normalization library that needs to be made accessible to the
-PostgreSQL server. This section explains how to set up the normalization
-library.
-
-!!! note
-    The external module is only needed when using the legacy tokenizer.
-    If you have chosen the ICU tokenizer, then you can ignore this section
-    and follow the standard import documentation.
-
-### Option 1: Compiling the library on the database server
-
-The most sure way to get a working library is to compile it on the database
-server. From the prerequisites you need at least cmake, gcc and the
-PostgreSQL server package.
-
-Clone or unpack the Nominatim source code, enter the source directory and
-create and enter a build directory.
-
-```sh
-cd Nominatim
-mkdir build
-cd build
-```
-
-Now configure cmake to only build the PostgreSQL module and build it:
-
-```
-cmake -DBUILD_IMPORTER=off -DBUILD_API=off -DBUILD_TESTS=off -DBUILD_DOCS=off -DBUILD_OSM2PGSQL=off ..
-make
-```
-
-When done, you find the normalization library in `build/module/nominatim.so`.
-Copy it to a place where it is readable and executable by the PostgreSQL server
-process.
-
-### Option 2: Compiling the library on the import machine
-
-You can also compile the normalization library on the machine from where you
-run the import.
-
-!!! important
-    You can only do this when the database server and the import machine have
-    the same architecture and run the same version of Linux. Otherwise there is
-    no guarantee that the compiled library is compatible with the PostgreSQL
-    server running on the database server.
-
-Make sure that the PostgreSQL server package is installed on the machine
-**with the same version as on the database server**. You do not need to install
-the PostgreSQL server itself.
-
-Download and compile Nominatim as per standard instructions. Once done, you find
-the normalization library in `build/module/nominatim.so`. Copy the file to
-the database server at a location where it is readable and executable by the
-PostgreSQL server process.
-
-### Running the import
-
-On the client side you now need to configure the import to point to the
-correct location of the library **on the database server**. Add the following
-line to your your `.env` file:
-
-```
-NOMINATIM_DATABASE_MODULE_PATH="<directory on the database server where nominatim.so resides>"
-```
-
-Now change the `NOMINATIM_DATABASE_DSN` to point to your remote server and continue
-to follow the [standard instructions for importing](Import.md).
+You can install Nominatim using a database that runs on a different server.
+Simply point the configuration variable `NOMINATIM_DATABASE_DSN` to the
+server and follow the standard import documentation.
 
+The import will be faster, if the import is run directly from the database
+machine. You can easily switch to a different machine for the query frontend
+after the import.
 
 ## Moving the database to another machine
 
@@ -225,20 +162,9 @@ target machine.
     data updates but the resulting database is only about a third of the size
     of a full database.
 
-Next install Nominatim on the target machine by following the standard installation
-instructions. Again, make sure to use the same version as the source machine.
+Next install nominatim-api on the target machine by following the standard
+installation instructions. Again, make sure to use the same version as the
+source machine.
 
 Create a project directory on your destination machine and set up the `.env`
-file to match the configuration on the source machine. Finally run
-
-    nominatim refresh --website
-
-to make sure that the local installation of Nominatim will be used.
-
-If you are using the legacy tokenizer you might also have to switch to the
-PostgreSQL module that was compiled on your target machine. If you get errors
-that PostgreSQL cannot find or access `nominatim.so` then rerun
-
-    nominatim refresh --functions
-
-on the target machine to update the the location of the module.
+file to match the configuration on the source machine. That's all.
diff --git a/docs/admin/Installation.md b/docs/admin/Installation.md
index 38c4d601..78062908 100644
--- a/docs/admin/Installation.md
+++ b/docs/admin/Installation.md
@@ -178,18 +178,6 @@ make
 sudo make install
 ```
 
-!!! warning
-    The default installation no longer compiles the PostgreSQL module that
-    is needed for the legacy tokenizer from older Nominatim versions. If you
-    are upgrading an older database or want to run the
-    [legacy tokenizer](../customize/Tokenizers.md#legacy-tokenizer) for
-    some other reason, you need to enable the PostgreSQL module via
-    cmake: `cmake -DBUILD_MODULE=on ../Nominatim`. To compile the module
-    you need to have the server development headers for PostgreSQL installed.
-    On Ubuntu/Debian run: `sudo apt install postgresql-server-dev-<postgresql version>`
-    The legacy tokenizer is deprecated and will be removed in Nominatim 5.0
-
-
 Nominatim installs itself into `/usr/local` per default. To choose a different
 installation directory add `-DCMAKE_INSTALL_PREFIX=<install root>` to the
 cmake command. Make sure that the `bin` directory is available in your path
diff --git a/docs/customize/Settings.md b/docs/customize/Settings.md
index b35fce3d..b00d04cf 100644
--- a/docs/customize/Settings.md
+++ b/docs/customize/Settings.md
@@ -64,26 +64,6 @@ Nominatim grants minimal rights to this user to all tables that are needed
 for running geocoding queries.
 
 
-#### NOMINATIM_DATABASE_MODULE_PATH
-
-| Summary            |                                                     |
-| --------------     | --------------------------------------------------- |
-| **Description:**   | Directory where to find the PostgreSQL server module |
-| **Format:**        | path |
-| **Default:**       | _empty_ (use `<project_directory>/module`) |
-| **After Changes:** | run `nominatim refresh --functions` |
-| **Comment:**       | Legacy tokenizer only |
-
-Defines the directory in which the PostgreSQL server module `nominatim.so`
-is stored. The directory and module must be accessible by the PostgreSQL
-server.
-
-For information on how to use this setting when working with external databases,
-see [Advanced Installations](../admin/Advanced-Installations.md).
-
-The option is only used by the Legacy tokenizer and ignored otherwise.
-
-
 #### NOMINATIM_TOKENIZER
 
 | Summary            |                                                     |
@@ -114,20 +94,6 @@ on the file format.
 If a relative path is given, then the file is searched first relative to the
 project directory and then in the global settings directory.
 
-#### NOMINATIM_MAX_WORD_FREQUENCY
-
-| Summary            |                                                     |
-| --------------     | --------------------------------------------------- |
-| **Description:**   | Number of occurrences before a word is considered frequent |
-| **Format:**        | int |
-| **Default:**       | 50000 |
-| **After Changes:** | cannot be changed after import |
-| **Comment:**       | Legacy tokenizer only |
-
-The word frequency count is used by the Legacy tokenizer to automatically
-identify _stop words_. Any partial term that occurs more often then what
-is defined in this setting, is effectively ignored during search.
-
 
 #### NOMINATIM_LIMIT_REINDEXING
 
@@ -162,25 +128,6 @@ codes, to restrict import to a subset of languages.
 Currently only affects the initial import of country names and special phrases.
 
 
-#### NOMINATIM_TERM_NORMALIZATION
-
-| Summary            |                                                     |
-| --------------     | --------------------------------------------------- |
-| **Description:**   | Rules for normalizing terms for comparisons |
-| **Format:**        | string: semicolon-separated list of ICU rules |
-| **Default:**       | :: NFD (); [[:Nonspacing Mark:] [:Cf:]] >;  :: lower (); [[:Punctuation:][:Space:]]+ > ' '; :: NFC (); |
-| **Comment:**       | Legacy tokenizer only |
-
-[Special phrases](Special-Phrases.md) have stricter matching requirements than
-normal search terms. They must appear exactly in the query after this term
-normalization has been applied.
-
-Only has an effect on the Legacy tokenizer. For the ICU tokenizer the rules
-defined in the
-[normalization section](Tokenizers.md#normalization-and-transliteration)
-will be used.
-
-
 #### NOMINATIM_USE_US_TIGER_DATA
 
 | Summary            |                                                     |
diff --git a/docs/customize/Tokenizers.md b/docs/customize/Tokenizers.md
index 49e86a50..30be170e 100644
--- a/docs/customize/Tokenizers.md
+++ b/docs/customize/Tokenizers.md
@@ -15,53 +15,6 @@ they can be configured.
     chosen tokenizer is very limited as well. See the comments in each tokenizer
     section.
 
-## Legacy tokenizer
-
-!!! danger
-    The Legacy tokenizer is deprecated and will be removed in Nominatim 5.0.
-    If you still use a database with the legacy tokenizer, you must reimport
-    it using the ICU tokenizer below.
-
-The legacy tokenizer implements the analysis algorithms of older Nominatim
-versions. It uses a special Postgresql module to normalize names and queries.
-This tokenizer is automatically installed and used when upgrading an older
-database. It should not be used for new installations anymore.
-
-### Compiling the PostgreSQL module
-
-The tokeinzer needs a special C module for PostgreSQL which is not compiled
-by default. If you need the legacy tokenizer, compile Nominatim as follows:
-
-```
-mkdir build
-cd build
-cmake -DBUILD_MODULE=on
-make
-```
-
-### Enabling the tokenizer
-
-To enable the tokenizer add the following line to your project configuration:
-
-```
-NOMINATIM_TOKENIZER=legacy
-```
-
-The Postgresql module for the tokenizer is available in the `module` directory
-and also installed with the remainder of the software under
-`lib/nominatim/module/nominatim.so`. You can specify a custom location for
-the module with
-
-```
-NOMINATIM_DATABASE_MODULE_PATH=<path to directory where nominatim.so resides>
-```
-
-This is in particular useful when the database runs on a different server.
-See [Advanced installations](../admin/Advanced-Installations.md#using-an-external-postgresql-database) for details.
-
-There are no other configuration options for the legacy tokenizer. All
-normalization functions are hard-coded.
-
 ## ICU tokenizer
 
 The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to
diff --git a/docs/develop/Testing.md b/docs/develop/Testing.md
index d57ab319..12673d40 100644
--- a/docs/develop/Testing.md
+++ b/docs/develop/Testing.md
@@ -72,8 +72,6 @@ The tests can be configured with a set of environment variables (`behave -D key=
  * `DB_PORT` - (optional) port of database on host
  * `DB_USER` - (optional) username of database login
  * `DB_PASS` - (optional) password for database login
- * `SERVER_MODULE_PATH` - (optional) path on the Postgres server to Nominatim
-                          module shared library file (only needed for legacy tokenizer)
  * `REMOVE_TEMPLATE` - if true, the template and API database will not be reused
                        during the next run. Reusing the base templates speeds
                        up tests considerably but might lead to outdated errors
diff --git a/settings/env.defaults b/settings/env.defaults
index d3952af0..b8c66667 100644
--- a/settings/env.defaults
+++ b/settings/env.defaults
@@ -18,12 +18,6 @@ NOMINATIM_DATABASE_WEBUSER="www-data"
 # Currently available tokenizers: icu, legacy
 NOMINATIM_TOKENIZER="icu"
 
-# Number of occurrences of a word before it is considered frequent.
-# Similar to the concept of stop words. Frequent partial words get ignored
-# or handled differently during search.
-# Changing this value requires a reimport.
-NOMINATIM_MAX_WORD_FREQUENCY=50000
-
 # If true, admin level changes on places with many contained children are blocked.
 NOMINATIM_LIMIT_REINDEXING=yes
 
@@ -34,12 +28,6 @@ NOMINATIM_LIMIT_REINDEXING=yes
 # Currently only affects the initial import of country names and special phrases.
 NOMINATIM_LANGUAGES=
 
-# Rules for normalizing terms for comparisons.
-# The default is to remove accents and punctuation and to lower-case the
-# term. Spaces are kept but collapsed to one standard space.
-# Changing this value requires a reimport.
-NOMINATIM_TERM_NORMALIZATION=":: NFD (); [[:Nonspacing Mark:] [:Cf:]] >;  :: lower (); [[:Punctuation:][:Space:]]+ > ' '; :: NFC ();"
-
 # Configuration file for the tokenizer.
 # The content depends on the tokenizer used. If left empty the default settings
 # for the chosen tokenizer will be used. The configuration can only be set
-- 
2.39.5