Sarah Hoffmann [Thu, 28 Oct 2021 09:27:31 +0000 (11:27 +0200)]
include unlisted places in ordering by housenumber
When ordering results by the fact that they have a housenumber,
also take cases into account where the housenumber is on the
place itself. This may happen when the search includes the name
of the place and the housenumber or for addr:place addresses
where the place is unlisted.
Sarah Hoffmann [Mon, 25 Oct 2021 12:55:15 +0000 (14:55 +0200)]
reverse: add index hints
The fairly complex where condition of idx_placex_geometry_placenode
won't always be matched by the query planner if the condition
part doesn't appear verbatim in the query.
Sarah Hoffmann [Mon, 25 Oct 2021 11:08:16 +0000 (13:08 +0200)]
fix warming for ICU tokenizer
Running the warm-up search requests requires querying
the most frequent words. This must be done via the tokenizer
to honor the different formats of the word table.
Sarah Hoffmann [Wed, 20 Oct 2021 20:05:15 +0000 (22:05 +0200)]
add new replication mode catch-up
This mode gets updates until the server reports no new diffs
anymore.
Also adds additional indexing, when the main indexing step left
a couple of objects to process. This happens only when the
next update is expected to be more than 40min away.
Sarah Hoffmann [Tue, 5 Oct 2021 15:18:10 +0000 (17:18 +0200)]
apply variants by languages
Adds a tagger for names by language so that the analyzer of that
language is used. Thus variants are now only applied to names
in the specific language and only tag name tags, no longer to
reference-like tags.
Sarah Hoffmann [Tue, 5 Oct 2021 12:10:32 +0000 (14:10 +0200)]
use analyser provided in the 'analyzer' property
Implements per-name choice of analyzer. If a non-default
analyzer is choosen, then the 'word' identifier is extended
with the name of the ana;yzer, so that we still have unique
items.
Sarah Hoffmann [Mon, 4 Oct 2021 16:31:58 +0000 (18:31 +0200)]
move parsing of token analysis config to analyzer
Adds a second callback for the analyzer which is responsible
for parsing the configuration rules and converting it to
whatever format necessary. This way, each analyzer implementation
can define its own configuration rules.
Sarah Hoffmann [Mon, 4 Oct 2021 15:34:30 +0000 (17:34 +0200)]
make token analyzers configurable modules
Adds a mandatory section 'analyzer' to the token-analysis entries
which define, which analyser to use. Currently there is exactly
one, generic, which implements the former ICUNameProcessor.
Sarah Hoffmann [Mon, 4 Oct 2021 14:40:28 +0000 (16:40 +0200)]
extend ICU config to accomodate multiple analysers
Adds parsing of multiple variant lists from the configuration.
Every entry except one must have a unique 'id' paramter to
distinguish the entries. The entry without id is considered
the default. Currently only the list without an id is used
for analysis.
Sarah Hoffmann [Thu, 30 Sep 2021 19:30:13 +0000 (21:30 +0200)]
introduce sanitizer step before token analysis
Sanatizer functions allow to transform name and address tags before
they are handed to the tokenizer. Theses transformations are visible
only for the tokenizer and thus only have an influence on the
search terms and address match terms for a place.
Currently two sanitizers are implemented which are responsible for
splitting names with multiple values and removing bracket additions.
Both was previously hard-coded in the tokenizer.
Sarah Hoffmann [Wed, 29 Sep 2021 15:37:04 +0000 (17:37 +0200)]
unify ICUNameProcessorRules and ICURuleLoader
There is no need for the additional layer of indirection that
the ICUNameProcessorRules class adds. The ICURuleLoader can
fill the database properties directly.
Sarah Hoffmann [Wed, 29 Sep 2021 09:54:14 +0000 (11:54 +0200)]
export more data for the tokenizer name preparation
Adds class, type, country and rank to the exported information
and removes the rather odd hack for countries. Whether a place
represents a country boundary can now be computed by the tokenizer.
Sarah Hoffmann [Mon, 27 Sep 2021 21:32:11 +0000 (23:32 +0200)]
adjust address levels for boundaries in Slovakia
Levels choosen according to OSM wiki. Mainly moves admin_level 6
to county level and admin_level 8 to city/town level. Higher
levels are adjusted accordingly.