X-Git-Url: https://git.openstreetmap.org./nominatim.git/blobdiff_plain/e7574f119eaab63723165f8139455d8af365a21e..9769a0dcdb6f489c6b7857281c24f1b680bcdd87:/docs/develop/ICU-Tokenizer-Modules.md diff --git a/docs/develop/ICU-Tokenizer-Modules.md b/docs/develop/ICU-Tokenizer-Modules.md index 2427ab11..2cf30a56 100644 --- a/docs/develop/ICU-Tokenizer-Modules.md +++ b/docs/develop/ICU-Tokenizer-Modules.md @@ -57,9 +57,9 @@ the function. show_source: no heading_level: 6 -### The sanitation function +### The main filter function of the sanitizer -The sanitation function receives a single object of type `ProcessInfo` +The filter function receives a single object of type `ProcessInfo` which has with three members: * `place`: read-only information about the place being processed. @@ -74,6 +74,22 @@ While the `place` member is provided for information only, the `names` and remove entries, change information within a single entry (for example by adding extra attributes) or completely replace the list with a different one. +#### PlaceInfo - information about the place + +::: nominatim.data.place_info.PlaceInfo + rendering: + show_source: no + heading_level: 6 + + +#### PlaceName - extended naming information + +::: nominatim.data.place_name.PlaceName + rendering: + show_source: no + heading_level: 6 + + ### Example: Filter for US street prefixes The following sanitizer removes the directional prefixes from street names @@ -102,49 +118,32 @@ the filter. The filter function first checks if the object is interesting for the sanitizer. Namely it checks if the place is in the US (through `country_code`) and it the place is a street (a `rank_address` of 26 or 27). If the -conditions are met, then it goes through all available names and replaces -any removes any leading direction prefix using a simple regular expression. +conditions are met, then it goes through all available names and +removes any leading directional prefix using a simple regular expression. Save the source code in a file in your project directory, for example as `us_streets.py`. Then you can use the sanitizer in your `icu_tokenizer.yaml`: -``` +``` yaml ... sanitizers: - step: us_streets.py ... ``` -For more sanitizer examples, have a look at the sanitizers provided by Nominatim. -They can be found in the directory `nominatim/tokenizer/sanitizers`. - !!! warning This example is just a simplified show case on how to create a sanitizer. It is not really read for real-world use: while the sanitizer would correcly transform `West 5th Street` into `5th Street`. it would also shorten a simple `North Street` to `Street`. -#### PlaceInfo - information about the place - -::: nominatim.data.place_info.PlaceInfo - rendering: - show_source: no - heading_level: 6 - - -#### PlaceName - extended naming information +For more sanitizer examples, have a look at the sanitizers provided by Nominatim. +They can be found in the directory +[`nominatim/tokenizer/sanitizers`](https://github.com/osm-search/Nominatim/tree/master/nominatim/tokenizer/sanitizers). -::: nominatim.data.place_name.PlaceName - rendering: - show_source: no - heading_level: 6 ## Custom token analysis module -Setup of a token analyser is split into two parts: configuration and -analyser factory. A token analysis module must therefore implement two -functions: - ::: nominatim.tokenizer.token_analysis.base.AnalysisModule rendering: show_source: no