Merge remote-tracking branch 'upstream/master'

[nominatim.git] / docs / admin / Tokenizers.md
diff --git a/docs/admin/Tokenizers.md b/docs/admin/Tokenizers.md

index a4d6aa0d247d2c01778722db158398f86c0483b4..6f8898c8ee70690d88aabd63661b758c9ed37b38 100644 (file)
--- a/docs/admin/Tokenizers.md
+++ b/docs/admin/Tokenizers.md
@@ -9,11 +9,11 @@ different configuration options. This sections describes the tokenizers and how
  they can be configured.
  
  !!! important
-The use of a tokenizer is tied to a database installation. You need to choose
-and configure the tokenizer before starting the initial import. Once the import
-is done, you cannot switch to another tokenizer anymore. Reconfiguring the
-chosen tokenizer is very limited as well. See the comments in each tokenizer
-section.
+    The use of a tokenizer is tied to a database installation. You need to choose
+    and configure the tokenizer before starting the initial import. Once the import
+    is done, you cannot switch to another tokenizer anymore. Reconfiguring the
+    chosen tokenizer is very limited as well. See the comments in each tokenizer
+    section.
  
  ## Legacy tokenizer
  
@@ -44,10 +44,20 @@ normalization functions are hard-coded.
  
  ## ICU tokenizer
  
+!!! danger
+    This tokenizer is currently in active development and still subject
+    to backwards-incompatible changes.
+
  The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to
  normalize names and queries. It also offers configurable decomposition and
  abbreviation handling.
  
+To enable the tokenizer add the following line to your project configuration:
+
+```
+NOMINATIM_TOKENIZER=icu
+```
+
  ### How it works
  
  On import the tokenizer processes names in the following four stages:
@@ -167,7 +177,7 @@ It is also possible to restrict replacements to the beginning and end of a
  name:
  
  ``` yaml
-- ^south => n  # matches only at the beginning of the name
+- ^south => s  # matches only at the beginning of the name
  - road$ => rd  # matches only at the end of the name
  ```
  
@@ -188,8 +198,8 @@ a shortcut notation for it:
  The simple arrow causes an additional variant to be added. Note that
  decomposition has an effect here on the source as well. So a rule
  
-```yaml
-- ~strasse => str
+``` yaml
+- "~strasse -> str"
  ```
  
  means that for a word like `hauptstrasse` four variants are created: