make ICU the default tokenizer

author Sarah Hoffmann <lonvia@denofr.de>

Tue, 10 May 2022 10:02:50 +0000 (12:02 +0200)

committer Sarah Hoffmann <lonvia@denofr.de>

Tue, 10 May 2022 10:02:50 +0000 (12:02 +0200)
author Sarah Hoffmann <lonvia@denofr.de>
Tue, 10 May 2022 10:02:50 +0000 (12:02 +0200)
committer Sarah Hoffmann <lonvia@denofr.de>
Tue, 10 May 2022 10:02:50 +0000 (12:02 +0200)
diff --git a/CMakeLists.txt b/CMakeLists.txt

index af7dbc2a43de5f0577502b299a5d816bc981db03..8360d549d24ff707ea9d3cb8aa2d650c55ca7fad 100644 (file)
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -44,7 +44,7 @@ endif()
  
  set(BUILD_IMPORTER on CACHE BOOL "Build everything for importing/updating the database")
  set(BUILD_API on CACHE BOOL "Build everything for the API server")
-set(BUILD_MODULE on CACHE BOOL "Build PostgreSQL module")
+set(BUILD_MODULE off CACHE BOOL "Build PostgreSQL module for legacy tokenizer")
  set(BUILD_TESTS on CACHE BOOL "Build test suite")
  set(BUILD_DOCS on CACHE BOOL "Build documentation")
  set(BUILD_MANPAGE on CACHE BOOL "Build Manual Page")
diff --git a/docs/admin/Installation.md b/docs/admin/Installation.md

index 8c4c670b39dc901c25285b3cd030c5f850d5bec2..6b5855791508bbf2fe28ac192bdc9313e869e9a4 100644 (file)
--- a/docs/admin/Installation.md
+++ b/docs/admin/Installation.md
@@ -158,6 +158,15 @@ make
  sudo make install
  ```
  
+!!! warning
+    The default installation no longer compiles the PostgreSQL module that
+    is needed for the legacy tokenizer from older Nominatim versions. If you
+    are upgrading an older database or want to run the
+    [legacy tokenizer](../customize/Tokenizers.md#legacy-tokenizer) for
+    some other reason, you need to enable the PostgreSQL module via
+    cmake: `cmake -DBUILD_MODULE=on ../Nominatim`
+
+
  Nominatim installs itself into `/usr/local` per default. To choose a different
  installation directory add `-DCMAKE_INSTALL_PREFIX=<install root>` to the
  cmake command. Make sure that the `bin` directory is available in your path
diff --git a/docs/customize/Tokenizers.md b/docs/customize/Tokenizers.md

index d849eb48c0d457c7c57b27e2807ff55675e2fa33..19d867ddd800063494d72ad6ac078025d7ce2347 100644 (file)
--- a/docs/customize/Tokenizers.md
+++ b/docs/customize/Tokenizers.md
@@ -19,7 +19,22 @@ they can be configured.
  
  The legacy tokenizer implements the analysis algorithms of older Nominatim
  versions. It uses a special Postgresql module to normalize names and queries.
-This tokenizer is currently the default.
+This tokenizer is automatically installed and used when upgrading an older
+database. It should not be used for new installations anymore.
+
+### Compiling the PostgreSQL module
+
+The tokeinzer needs a special C module for PostgreSQL which is not compiled
+by default. If you need the legacy tokenizer, compile Nominatim as follows:
+
+```
+mkdir build
+cd build
+cmake -DBUILD_MODULE=on
+make
+```
+
+### Enabling the tokenizer
  
  To enable the tokenizer add the following line to your project configuration:
  
@@ -47,6 +62,7 @@ normalization functions are hard-coded.
  The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to
  normalize names and queries. It also offers configurable decomposition and
  abbreviation handling.
+This tokenizer is currently the default.
  
  To enable the tokenizer add the following line to your project configuration:
  
diff --git a/settings/env.defaults b/settings/env.defaults

index e5dfe4a6094c01b9ea9605b5de06b8cfc666b274..3115f4382aacf582c5a1054e78c03130bde9f00f 100644 (file)
--- a/settings/env.defaults
+++ b/settings/env.defaults
@@ -21,8 +21,8 @@ NOMINATIM_DATABASE_MODULE_PATH=
  # Tokenizer used for normalizing and parsing queries and names.
  # The tokenizer is set up during import and cannot be changed afterwards
  # without a reimport.
-# Currently available tokenizers: legacy
-NOMINATIM_TOKENIZER="legacy"
+# Currently available tokenizers: icu, legacy
+NOMINATIM_TOKENIZER="icu"
  
  # Number of occurrences of a word before it is considered frequent.
  # Similar to the concept of stop words. Frequent partial words get ignored
author	Sarah Hoffmann <lonvia@denofr.de>
	Tue, 10 May 2022 10:02:50 +0000 (12:02 +0200)
committer	Sarah Hoffmann <lonvia@denofr.de>
	Tue, 10 May 2022 10:02:50 +0000 (12:02 +0200)
CMakeLists.txt		patch \| blob \| history
docs/admin/Installation.md		patch \| blob \| history
docs/customize/Tokenizers.md		patch \| blob \| history
settings/env.defaults		patch \| blob \| history