ADD_CUSTOM_TARGET(doc
COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/bash2md.sh ${PROJECT_SOURCE_DIR}/vagrant/Install-on-Ubuntu-20.sh ${CMAKE_CURRENT_BINARY_DIR}/appendix/Install-on-Ubuntu-20.md
COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/bash2md.sh ${PROJECT_SOURCE_DIR}/vagrant/Install-on-Ubuntu-22.sh ${CMAKE_CURRENT_BINARY_DIR}/appendix/Install-on-Ubuntu-22.md
- COMMAND PYTHONPATH=${PROJECT_SOURCE_DIR} mkdocs build -d ${CMAKE_CURRENT_BINARY_DIR}/../site-html -f ${CMAKE_CURRENT_BINARY_DIR}/../mkdocs.yml
+ COMMAND mkdocs build -d ${CMAKE_CURRENT_BINARY_DIR}/../site-html -f ${CMAKE_CURRENT_BINARY_DIR}/../mkdocs.yml
)
ADD_CUSTOM_TARGET(serve-doc
- COMMAND PYTHONPATH=${PROJECT_SOURCE_DIR} mkdocs serve
- WORKING_DIRECTORY ${PROJECT_BINARY_DIR}
+ COMMAND mkdocs serve -f ${CMAKE_CURRENT_BINARY_DIR}/../mkdocs.yml
+ WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}
)
::: nominatim.tokenizer.sanitizers.split_name_list
selection:
members: False
- rendering:
+ options:
heading_level: 6
+ docstring_section_style: spacy
##### strip-brace-terms
::: nominatim.tokenizer.sanitizers.strip_brace_terms
selection:
members: False
- rendering:
+ options:
heading_level: 6
+ docstring_section_style: spacy
##### tag-analyzer-by-language
::: nominatim.tokenizer.sanitizers.tag_analyzer_by_language
selection:
members: False
- rendering:
+ options:
heading_level: 6
+ docstring_section_style: spacy
##### clean-housenumbers
::: nominatim.tokenizer.sanitizers.clean_housenumbers
selection:
members: False
- rendering:
+ options:
heading_level: 6
+ docstring_section_style: spacy
##### clean-postcodes
::: nominatim.tokenizer.sanitizers.clean_postcodes
selection:
members: False
- rendering:
+ options:
heading_level: 6
+ docstring_section_style: spacy
##### clean-tiger-tags
::: nominatim.tokenizer.sanitizers.clean_tiger_tags
selection:
members: False
- rendering:
+ options:
heading_level: 6
+ docstring_section_style: spacy
#### delete-tags
::: nominatim.tokenizer.sanitizers.delete_tags
selection:
members: False
- rendering:
+ options:
heading_level: 6
+ docstring_section_style: spacy
#### tag-japanese
::: nominatim.tokenizer.sanitizers.tag_japanese
selection:
members: False
- rendering:
+ options:
heading_level: 6
+ docstring_section_style: spacy
#### Token Analysis
The documentation is built with mkdocs:
* [mkdocs](https://www.mkdocs.org/) >= 1.1.2
-* [mkdocstrings](https://mkdocstrings.github.io/) >= 0.16
-* [mkdocstrings-python-legacy](https://mkdocstrings.github.io/python-legacy/)
+* [mkdocstrings](https://mkdocstrings.github.io/) >= 0.18
+* [mkdocstrings-python](https://mkdocstrings.github.io/python/)
### Installing prerequisites on Ubuntu/Debian
### Sanitizer configuration
::: nominatim.tokenizer.sanitizers.config.SanitizerConfig
- rendering:
- show_source: no
- heading_level: 6
+ options:
+ heading_level: 3
### The main filter function of the sanitizer
The filter function receives a single object of type `ProcessInfo`
which has with three members:
- * `place`: read-only information about the place being processed.
+ * `place: PlaceInfo`: read-only information about the place being processed.
See PlaceInfo below.
- * `names`: The current list of names for the place. Each name is a
- PlaceName object.
- * `address`: The current list of address names for the place. Each name
- is a PlaceName object.
+ * `names: List[PlaceName]`: The current list of names for the place.
+ * `address: List[PlaceName]`: The current list of address names for the place.
While the `place` member is provided for information only, the `names` and
`address` lists are meant to be manipulated by the sanitizer. It may add and
#### PlaceInfo - information about the place
::: nominatim.data.place_info.PlaceInfo
- rendering:
- show_source: no
- heading_level: 6
+ options:
+ heading_level: 3
#### PlaceName - extended naming information
::: nominatim.data.place_name.PlaceName
- rendering:
- show_source: no
- heading_level: 6
+ options:
+ heading_level: 3
### Example: Filter for US street prefixes
## Custom token analysis module
::: nominatim.tokenizer.token_analysis.base.AnalysisModule
- rendering:
- show_source: no
- heading_level: 6
+ options:
+ heading_level: 3
::: nominatim.tokenizer.token_analysis.base.Analyzer
- rendering:
- show_source: no
- heading_level: 6
+ options:
+ heading_level: 3
### Example: Creating acronym variants for long names
and implement the abstract functions defined there.
::: nominatim.tokenizer.base.AbstractTokenizer
- rendering:
- heading_level: 4
+ options:
+ heading_level: 3
### Python Analyzer Class
::: nominatim.tokenizer.base.AbstractAnalyzer
- rendering:
- heading_level: 4
+ options:
+ heading_level: 3
### PL/pgSQL Functions
- search
- mkdocstrings:
handlers:
- python-legacy:
- rendering:
- show_source: false
- show_signature_annotations: false
+ python:
+ paths: ["${PROJECT_SOURCE_DIR}"]
+ options:
+ show_source: False
+ show_bases: False
Returns:
The function returns the list of all tuples that could be
- found for the given words. Each list entry is a tuple of
- (original word, word token, word id).
+ found for the given words. Each list entry is a tuple of
+ (original word, word token, word id).
"""
Returns:
A JSON-serialisable structure that will be handed into
- the database via the `token_info` field.
+ the database via the `token_info` field.
"""
tables should be skipped. This option is only required for
migration purposes and can be safely ignored by custom
tokenizers.
-
- TODO: can we move the init_db parameter somewhere else?
"""
Returns:
If an issue was found, return an error message with the
- description of the issue as well as hints for the user on
- how to resolve the issue. If everything is okay, return `None`.
+ description of the issue as well as hints for the user on
+ how to resolve the issue. If everything is okay, return `None`.
"""
@abstractmethod
def most_frequent_words(self, conn: Connection, num: int) -> List[str]:
- """ Return a list of the `num` most frequent full words
- in the database.
+ """ Return a list of the most frequent full words in the database.
+
+ Arguments:
+ conn: Open connection to the database which may be used to
+ retrive the words.
+ num: Maximum number of words to return.
"""
Returns:
If the parameter value is a simple string, it is returned as a
- one-item list. If the parameter value does not exist, the given
- default is returned. If the parameter value is a list, it is
- checked to contain only strings before being returned.
+ one-item list. If the parameter value does not exist, the given
+ default is returned. If the parameter value is a list, it is
+ checked to contain only strings before being returned.
"""
values = self.data.get(param, None)
Returns:
A regular expression pattern which can be used to
- split a string. The regular expression makes sure that the
- resulting names are stripped and that repeated delimiters
- are ignored. It may still create empty fields on occasion. The
- code needs to filter those.
+ split a string. The regular expression makes sure that the
+ resulting names are stripped and that repeated delimiters
+ are ignored. It may still create empty fields on occasion. The
+ code needs to filter those.
"""
delimiter_set = set(self.data.get('delimiters', default))
if not delimiter_set:
Returns:
A filter function that takes a target string as the argument and
- returns True if it fully matches any of the regular expressions
- otherwise returns False.
+ returns True if it fully matches any of the regular expressions
+ otherwise returns False.
"""
filters = self.get_string_list(param) or default
Returns:
ID string with a canonical form of the name. The string may
- be empty, when the analyzer cannot analyze the name at all,
- for example because the character set in use does not match.
+ be empty, when the analyzer cannot analyze the name at all,
+ for example because the character set in use does not match.
"""
def compute_variants(self, canonical_id: str) -> List[str]:
Returns:
A list of possible spelling variants. All strings must have
- been transformed with the global normalizer and
- transliterator ICU rules. Otherwise they cannot be matched
- against the input by the query frontend.
- The list may be empty, when there are no useful
- spelling variants. This may happen when an analyzer only
- usually outputs additional variants to the canonical spelling
- and there are no such variants.
+ been transformed with the global normalizer and
+ transliterator ICU rules. Otherwise they cannot be matched
+ against the input by the query frontend.
+ The list may be empty, when there are no useful
+ spelling variants. This may happen when an analyzer only
+ usually outputs additional variants to the canonical spelling
+ and there are no such variants.
"""
Returns:
A data object with configuration data. This will be handed
- as is into the `create()` function and may be
- used freely by the analysis module as needed.
+ as is into the `create()` function and may be
+ used freely by the analysis module as needed.
"""
def create(self, normalizer: Any, transliterator: Any, config: Any) -> Analyzer:
Returns:
A new analyzer instance. This must be an object that implements
- the Analyzer protocol.
+ the Analyzer protocol.
"""