X-Git-Url: https://git.openstreetmap.org./nominatim.git/blobdiff_plain/5dd24b3ef05851dc062a2bfeeac7260175f10b69..3656eed9ade48dd7f929f12627f8fd1dc3a5d7bc:/docs/admin/Tokenizers.md?ds=sidebyside diff --git a/docs/admin/Tokenizers.md b/docs/admin/Tokenizers.md index a4d6aa0d..f3454f67 100644 --- a/docs/admin/Tokenizers.md +++ b/docs/admin/Tokenizers.md @@ -9,11 +9,11 @@ different configuration options. This sections describes the tokenizers and how they can be configured. !!! important -The use of a tokenizer is tied to a database installation. You need to choose -and configure the tokenizer before starting the initial import. Once the import -is done, you cannot switch to another tokenizer anymore. Reconfiguring the -chosen tokenizer is very limited as well. See the comments in each tokenizer -section. + The use of a tokenizer is tied to a database installation. You need to choose + and configure the tokenizer before starting the initial import. Once the import + is done, you cannot switch to another tokenizer anymore. Reconfiguring the + chosen tokenizer is very limited as well. See the comments in each tokenizer + section. ## Legacy tokenizer @@ -44,6 +44,10 @@ normalization functions are hard-coded. ## ICU tokenizer +!!! danger + This tokenizer is currently in active development and still subject + to backwards-incompatible changes. + The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to normalize names and queries. It also offers configurable decomposition and abbreviation handling. @@ -167,7 +171,7 @@ It is also possible to restrict replacements to the beginning and end of a name: ``` yaml -- ^south => n # matches only at the beginning of the name +- ^south => s # matches only at the beginning of the name - road$ => rd # matches only at the end of the name ``` @@ -188,8 +192,8 @@ a shortcut notation for it: The simple arrow causes an additional variant to be added. Note that decomposition has an effect here on the source as well. So a rule -```yaml -- ~strasse => str +``` yaml +- "~strasse -> str" ``` means that for a word like `hauptstrasse` four variants are created: