they can be configured.
!!! important
-The use of a tokenizer is tied to a database installation. You need to choose
-and configure the tokenizer before starting the initial import. Once the import
-is done, you cannot switch to another tokenizer anymore. Reconfiguring the
-chosen tokenizer is very limited as well. See the comments in each tokenizer
-section.
+ The use of a tokenizer is tied to a database installation. You need to choose
+ and configure the tokenizer before starting the initial import. Once the import
+ is done, you cannot switch to another tokenizer anymore. Reconfiguring the
+ chosen tokenizer is very limited as well. See the comments in each tokenizer
+ section.
## Legacy tokenizer
## ICU tokenizer
+!!! danger
+ This tokenizer is currently in active development and still subject
+ to backwards-incompatible changes.
+
The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to
normalize names and queries. It also offers configurable decomposition and
abbreviation handling.
name:
``` yaml
-- ^south => n # matches only at the beginning of the name
+- ^south => s # matches only at the beginning of the name
- road$ => rd # matches only at the end of the name
```
The simple arrow causes an additional variant to be added. Note that
decomposition has an effect here on the source as well. So a rule
-```yaml
-- ~strasse => str
+``` yaml
+- "~strasse -> str"
```
means that for a word like `hauptstrasse` four variants are created: