]> git.openstreetmap.org Git - nominatim.git/log
nominatim.git
3 years agofurther refactor setup to keep function small
Sarah Hoffmann [Tue, 26 Oct 2021 08:57:51 +0000 (10:57 +0200)]
further refactor setup to keep function small

3 years agosearches for house numbers must have an address
Sarah Hoffmann [Tue, 26 Oct 2021 08:28:28 +0000 (10:28 +0200)]
searches for house numbers must have an address

3 years agodisallow search for partials without address
Sarah Hoffmann [Tue, 26 Oct 2021 08:23:55 +0000 (10:23 +0200)]
disallow search for partials without address

Very frequent partial terms take too long to look up and
do not return any valuable results unless the search is
further narrowed down by an address.

3 years agomake word count computation part of the import
Sarah Hoffmann [Tue, 26 Oct 2021 07:37:57 +0000 (09:37 +0200)]
make word count computation part of the import

Accurate word counts are now essential when using
the ICU tokenizer and don't hurt for the legacy one.

Adds about an hour import time.

3 years agoactions: move ICU tests into its own run
Sarah Hoffmann [Tue, 26 Oct 2021 08:32:43 +0000 (10:32 +0200)]
actions: move ICU tests into its own run

3 years agoMerge pull request #2486 from lonvia/fix-special-phrases
Sarah Hoffmann [Mon, 25 Oct 2021 19:45:08 +0000 (21:45 +0200)]
Merge pull request #2486 from lonvia/fix-special-phrases

Fix parsing of operator in special phrases

3 years agoICU: add an index over word_ids
Sarah Hoffmann [Mon, 25 Oct 2021 19:33:27 +0000 (21:33 +0200)]
ICU: add an index over word_ids

Needed for keyword lookup in the details response.

3 years agobe case-insensitve about special phrase operator
Sarah Hoffmann [Mon, 25 Oct 2021 17:51:20 +0000 (19:51 +0200)]
be case-insensitve about special phrase operator

3 years agofix parsing of operator in special phrases
Sarah Hoffmann [Mon, 25 Oct 2021 17:46:30 +0000 (19:46 +0200)]
fix parsing of operator in special phrases

Because of unstripped input, the operators wouldn't match.

3 years agoMerge pull request #2484 from lonvia/fix-index-use
Sarah Hoffmann [Mon, 25 Oct 2021 15:20:42 +0000 (17:20 +0200)]
Merge pull request #2484 from lonvia/fix-index-use

Reverse: add index hints

3 years agoMerge pull request #2483 from lonvia/fix-warming
Sarah Hoffmann [Mon, 25 Oct 2021 14:21:36 +0000 (16:21 +0200)]
Merge pull request #2483 from lonvia/fix-warming

Fix warming for ICU tokenizer

3 years agoreverse: add index hints
Sarah Hoffmann [Mon, 25 Oct 2021 12:55:15 +0000 (14:55 +0200)]
reverse: add index hints

The fairly complex where condition of idx_placex_geometry_placenode
won't always be matched by the query planner if the condition
part doesn't appear verbatim in the query.

Fixes #2480.

3 years agofix warming for ICU tokenizer
Sarah Hoffmann [Mon, 25 Oct 2021 11:08:16 +0000 (13:08 +0200)]
fix warming for ICU tokenizer

Running the warm-up search requests requires querying
the most frequent words. This must be done via the tokenizer
to honor the different formats of the word table.

3 years agoallow relative paths for log files
Sarah Hoffmann [Mon, 25 Oct 2021 08:13:11 +0000 (10:13 +0200)]
allow relative paths for log files

3 years agoMerge pull request #2476 from lonvia/harmonize-configuration-file-settings
Sarah Hoffmann [Sun, 24 Oct 2021 08:57:48 +0000 (10:57 +0200)]
Merge pull request #2476 from lonvia/harmonize-configuration-file-settings

Standardize handling of file names in configuration values

3 years agoallow relative paths for flatnode file
Sarah Hoffmann [Fri, 22 Oct 2021 15:32:51 +0000 (17:32 +0200)]
allow relative paths for flatnode file

3 years agoswitch IMPORT_STYLE to use generic file search
Sarah Hoffmann [Fri, 22 Oct 2021 14:49:57 +0000 (16:49 +0200)]
switch IMPORT_STYLE to use generic file search

Allows relative paths wrt project directory.

3 years agohave ADDRESS_LEVEL_CONFIG use load_sub_configuration
Sarah Hoffmann [Fri, 22 Oct 2021 14:31:33 +0000 (16:31 +0200)]
have ADDRESS_LEVEL_CONFIG use load_sub_configuration

This means that relative paths now are looked up in the
project directory.

3 years agoreplace NOMINATIM_PHRASE_CONFIG with command line option
Sarah Hoffmann [Fri, 22 Oct 2021 12:41:14 +0000 (14:41 +0200)]
replace NOMINATIM_PHRASE_CONFIG with command line option

3 years agodoc: clarify relative paths for tokenizer config
Sarah Hoffmann [Thu, 21 Oct 2021 14:38:06 +0000 (16:38 +0200)]
doc: clarify relative paths for tokenizer config

3 years agoMerge pull request #2475 from lonvia/catchup-mode
Sarah Hoffmann [Thu, 21 Oct 2021 14:21:58 +0000 (16:21 +0200)]
Merge pull request #2475 from lonvia/catchup-mode

Add catch-up mode to replication and extend documentation for updating

3 years agoextend documentation for updating database
Sarah Hoffmann [Thu, 21 Oct 2021 10:14:47 +0000 (12:14 +0200)]
extend documentation for updating database

Explains the different modes and adds hints for
setting up a systemd job.

3 years agoadd new replication mode catch-up
Sarah Hoffmann [Wed, 20 Oct 2021 20:05:15 +0000 (22:05 +0200)]
add new replication mode catch-up

This mode gets updates until the server reports no new diffs
anymore.

Also adds additional indexing, when the main indexing step left
a couple of objects to process. This happens only when the
next update is expected to be more than 40min away.

3 years agorun Tiger import with parallel threads per default
Sarah Hoffmann [Tue, 19 Oct 2021 13:00:26 +0000 (15:00 +0200)]
run Tiger import with parallel threads per default

3 years agoMerge pull request #2472 from lonvia/word-count-computation
Sarah Hoffmann [Tue, 19 Oct 2021 12:58:57 +0000 (14:58 +0200)]
Merge pull request #2472 from lonvia/word-count-computation

Fix word count computation for ICU tokenizer

3 years agoadapt tests for new word count mechanism
Sarah Hoffmann [Tue, 19 Oct 2021 10:03:48 +0000 (12:03 +0200)]
adapt tests for new word count mechanism

3 years agoicu: no longer precompute terms
Sarah Hoffmann [Tue, 19 Oct 2021 09:50:06 +0000 (11:50 +0200)]
icu: no longer precompute terms

The ICU analyzer no longer drops frequent partials, so it is no
longer necessary to know the frequencies in advance.

3 years agomake word recount a tokenizer-specific function
Sarah Hoffmann [Tue, 19 Oct 2021 09:21:16 +0000 (11:21 +0200)]
make word recount a tokenizer-specific function

3 years agoMerge pull request #2471 from lonvia/update-install-rules
Sarah Hoffmann [Tue, 19 Oct 2021 07:11:16 +0000 (09:11 +0200)]
Merge pull request #2471 from lonvia/update-install-rules

Reorganise, update and extend documentation

3 years agodocs: fix more links
Sarah Hoffmann [Mon, 18 Oct 2021 15:26:14 +0000 (17:26 +0200)]
docs: fix more links

3 years agodocs: refer to our new Settings chapter in the import instruchtions
Sarah Hoffmann [Mon, 18 Oct 2021 15:02:52 +0000 (17:02 +0200)]
docs: refer to our new Settings chapter in the import instruchtions

3 years agocheck and fix all liks in documentation
Sarah Hoffmann [Mon, 18 Oct 2021 14:53:24 +0000 (16:53 +0200)]
check and fix all liks in documentation

3 years agoadd extended documentation of settings
Sarah Hoffmann [Thu, 14 Oct 2021 12:36:09 +0000 (14:36 +0200)]
add extended documentation of settings

3 years agodocs: update overview pages
Sarah Hoffmann [Thu, 14 Oct 2021 08:21:52 +0000 (10:21 +0200)]
docs: update overview pages

3 years agodocs: move place ranking into customization part
Sarah Hoffmann [Thu, 14 Oct 2021 08:10:54 +0000 (10:10 +0200)]
docs: move place ranking into customization part

3 years agodocs: nominatim-ui has a new place for custom config
Sarah Hoffmann [Thu, 14 Oct 2021 08:06:01 +0000 (10:06 +0200)]
docs: nominatim-ui has a new place for custom config

3 years agodocs: move import style description to customize section
Sarah Hoffmann [Tue, 12 Oct 2021 21:07:41 +0000 (23:07 +0200)]
docs: move import style description to customize section

3 years agodocs: make customization chapter a separate section
Sarah Hoffmann [Tue, 12 Oct 2021 19:25:13 +0000 (21:25 +0200)]
docs: make customization chapter a separate section

3 years agofix typo
Sarah Hoffmann [Tue, 12 Oct 2021 09:04:44 +0000 (11:04 +0200)]
fix typo

3 years agodocs: remove the development warning for ICU tokenizer
Sarah Hoffmann [Tue, 12 Oct 2021 08:31:18 +0000 (10:31 +0200)]
docs: remove the development warning for ICU tokenizer

3 years agodocs: add a warning about using --no-updates with TIGER data
Sarah Hoffmann [Tue, 12 Oct 2021 08:25:50 +0000 (10:25 +0200)]
docs: add a warning about using --no-updates with TIGER data

3 years agoupdate and extend man page
Sarah Hoffmann [Mon, 11 Oct 2021 21:27:38 +0000 (23:27 +0200)]
update and extend man page

Provide extended descriptions for most subcommands.

3 years agorename manual directory to man
Sarah Hoffmann [Mon, 11 Oct 2021 20:23:38 +0000 (22:23 +0200)]
rename manual directory to man

Avoids confusion between 'docs' and 'manual'.

3 years agoadd munin scipts and ICU subrules to installation
Sarah Hoffmann [Mon, 11 Oct 2021 20:10:54 +0000 (22:10 +0200)]
add munin scipts and ICU subrules to installation

3 years agoMerge pull request #2469 from lonvia/fix-tablespace-assignment
Sarah Hoffmann [Fri, 15 Oct 2021 16:20:43 +0000 (18:20 +0200)]
Merge pull request #2469 from lonvia/fix-tablespace-assignment

Fix template expressions for tablespaces

3 years agofix template expressions for tablespaces
Sarah Hoffmann [Fri, 15 Oct 2021 13:07:43 +0000 (15:07 +0200)]
fix template expressions for tablespaces

3 years agoMerge pull request #2450 from mtmail/tiger-data-2021
Sarah Hoffmann [Mon, 11 Oct 2021 17:22:15 +0000 (19:22 +0200)]
Merge pull request #2450 from mtmail/tiger-data-2021

US TIGER data 2021 released

3 years agoMerge pull request #2465 from lonvia/use-spgist-index
Sarah Hoffmann [Mon, 11 Oct 2021 08:48:44 +0000 (10:48 +0200)]
Merge pull request #2465 from lonvia/use-spgist-index

Use SP-GIST for building index

3 years agoremove outdated country_languages.php
Sarah Hoffmann [Sun, 10 Oct 2021 19:58:43 +0000 (21:58 +0200)]
remove outdated country_languages.php

3 years agoadd recommendation for Postgis 3+
Sarah Hoffmann [Sun, 10 Oct 2021 12:23:08 +0000 (14:23 +0200)]
add recommendation for Postgis 3+

3 years agouse SP-GIST index for building index where available
Sarah Hoffmann [Sun, 10 Oct 2021 12:17:03 +0000 (14:17 +0200)]
use SP-GIST index for building index where available

Point-in-polygon queries are much faster with a SP-GIST geometry
index, so use that for the index used to check if a housenumber
is inside a building.

Only available with Postgis 3. There is an automatic fallback to
GIST for Postgis 2.

3 years agoMerge pull request #2460 from lonvia/multiple-analyzers
Sarah Hoffmann [Sat, 9 Oct 2021 12:41:09 +0000 (14:41 +0200)]
Merge pull request #2460 from lonvia/multiple-analyzers

Add support for multiple token analyzers

3 years agoadd documentation for new configuration of ICU tokenizer
Sarah Hoffmann [Thu, 7 Oct 2021 09:55:53 +0000 (11:55 +0200)]
add documentation for new configuration of ICU tokenizer

3 years agofix argument description for check_database
Sarah Hoffmann [Thu, 7 Oct 2021 07:49:13 +0000 (09:49 +0200)]
fix argument description for check_database

3 years agoreorganize and complete tests around generic token analysis
Sarah Hoffmann [Wed, 6 Oct 2021 15:03:37 +0000 (17:03 +0200)]
reorganize and complete tests around generic token analysis

3 years agoadd tests for sanitizer tagging language
Sarah Hoffmann [Wed, 6 Oct 2021 10:29:25 +0000 (12:29 +0200)]
add tests for sanitizer tagging language

3 years agoapply variants by languages
Sarah Hoffmann [Tue, 5 Oct 2021 15:18:10 +0000 (17:18 +0200)]
apply variants by languages

Adds a tagger for names by language so that the analyzer of that
language is used. Thus variants are now only applied to names
in the specific language and only tag name tags, no longer to
reference-like tags.

3 years agouse analyser provided in the 'analyzer' property
Sarah Hoffmann [Tue, 5 Oct 2021 12:10:32 +0000 (14:10 +0200)]
use analyser provided in the 'analyzer' property

Implements per-name choice of analyzer. If a non-default
analyzer is choosen, then the 'word' identifier is extended
with the name of the ana;yzer, so that we still have unique
items.

3 years agoremove support for properties on variants
Sarah Hoffmann [Tue, 5 Oct 2021 08:29:36 +0000 (10:29 +0200)]
remove support for properties on variants

Those are not going to be used in the near future, so no need to
carry that code around just now.

3 years agoprecompute replacements while loading configuration
Sarah Hoffmann [Tue, 5 Oct 2021 08:20:08 +0000 (10:20 +0200)]
precompute replacements while loading configuration

3 years agomove parsing of token analysis config to analyzer
Sarah Hoffmann [Mon, 4 Oct 2021 16:31:58 +0000 (18:31 +0200)]
move parsing of token analysis config to analyzer

Adds a second callback for the analyzer which is responsible
for parsing the configuration rules and converting it to
whatever format necessary. This way, each analyzer implementation
can define its own configuration rules.

3 years agomake token analyzers configurable modules
Sarah Hoffmann [Mon, 4 Oct 2021 15:34:30 +0000 (17:34 +0200)]
make token analyzers configurable modules

Adds a mandatory section 'analyzer' to the token-analysis entries
which define, which analyser to use. Currently there is exactly
one, generic, which implements the former ICUNameProcessor.

3 years agoextend ICU config to accomodate multiple analysers
Sarah Hoffmann [Mon, 4 Oct 2021 14:40:28 +0000 (16:40 +0200)]
extend ICU config to accomodate multiple analysers

Adds parsing of multiple variant lists from the configuration.
Every entry except one must have a unique 'id' paramter to
distinguish the entries. The entry without id is considered
the default. Currently only the list without an id is used
for analysis.

3 years agomove flatten_config_list into config module
Sarah Hoffmann [Mon, 4 Oct 2021 09:56:54 +0000 (11:56 +0200)]
move flatten_config_list into config module

For general usage by other modules.

3 years agoMerge pull request #2458 from lonvia/add-tokenizer-preprocessing
Sarah Hoffmann [Fri, 1 Oct 2021 19:53:34 +0000 (21:53 +0200)]
Merge pull request #2458 from lonvia/add-tokenizer-preprocessing

Add a "sanitation" step for name and address tags before token processing

3 years agoreplace test variable for PG env tests
Sarah Hoffmann [Fri, 1 Oct 2021 08:51:41 +0000 (10:51 +0200)]
replace test variable for PG env tests

'tty' was removed in PG14 and causes an error.

3 years agoadd unit tests for new sanatizer functions
Sarah Hoffmann [Fri, 1 Oct 2021 07:50:17 +0000 (09:50 +0200)]
add unit tests for new sanatizer functions

3 years agointroduce sanitizer step before token analysis
Sarah Hoffmann [Thu, 30 Sep 2021 19:30:13 +0000 (21:30 +0200)]
introduce sanitizer step before token analysis

Sanatizer functions allow to transform name and address tags before
they are handed to the tokenizer. Theses transformations are visible
only for the tokenizer and thus only have an influence on the
search terms and address match terms for a place.

Currently two sanitizers are implemented which are responsible for
splitting names with multiple values and removing bracket additions.
Both was previously hard-coded in the tokenizer.

3 years agounify ICUNameProcessorRules and ICURuleLoader
Sarah Hoffmann [Wed, 29 Sep 2021 15:37:04 +0000 (17:37 +0200)]
unify ICUNameProcessorRules and ICURuleLoader

There is no need for the additional layer of indirection that
the ICUNameProcessorRules class adds. The ICURuleLoader can
fill the database properties directly.

3 years agofix typo
Sarah Hoffmann [Wed, 29 Sep 2021 12:16:09 +0000 (14:16 +0200)]
fix typo

3 years agoexport more data for the tokenizer name preparation
Sarah Hoffmann [Wed, 29 Sep 2021 09:54:14 +0000 (11:54 +0200)]
export more data for the tokenizer name preparation

Adds class, type, country and rank to the exported information
and removes the rather odd hack for countries. Whether a place
represents a country boundary can now be computed by the tokenizer.

3 years agoadd wrapper class for place data passed to tokenizer
Sarah Hoffmann [Wed, 29 Sep 2021 08:37:54 +0000 (10:37 +0200)]
add wrapper class for place data passed to tokenizer

This is mostly for convenience and documentation purposes.

3 years agoMerge pull request #2455 from lonvia/adjust-address-levels-slovakia
Sarah Hoffmann [Tue, 28 Sep 2021 09:21:08 +0000 (11:21 +0200)]
Merge pull request #2455 from lonvia/adjust-address-levels-slovakia

Adjust address levels for boundaries in Slovakia

3 years agoMerge pull request #2454 from lonvia/sort-out-token-assignment-in-sql
Sarah Hoffmann [Tue, 28 Sep 2021 07:45:15 +0000 (09:45 +0200)]
Merge pull request #2454 from lonvia/sort-out-token-assignment-in-sql

ICU tokenizer: switch match method to using partial terms

3 years agoadjust address levels for boundaries in Slovakia
Sarah Hoffmann [Mon, 27 Sep 2021 21:32:11 +0000 (23:32 +0200)]
adjust address levels for boundaries in Slovakia

Levels choosen according to OSM wiki. Mainly moves admin_level 6
to county level and admin_level 8 to city/town level. Higher
levels are adjusted accordingly.

Fixes #2453.

3 years agoadapt tests to new ICU address token handling
Sarah Hoffmann [Mon, 27 Sep 2021 15:36:23 +0000 (17:36 +0200)]
adapt tests to new ICU address token handling

3 years agoremove unused parameter
Sarah Hoffmann [Mon, 27 Sep 2021 12:58:43 +0000 (14:58 +0200)]
remove unused parameter

3 years agoMerge pull request #2452 from lonvia/update-houses-on-street-name-change
Sarah Hoffmann [Mon, 27 Sep 2021 12:55:50 +0000 (14:55 +0200)]
Merge pull request #2452 from lonvia/update-houses-on-street-name-change

Force update of surrounding houses when street or place name changes

3 years agoicu tokenizer: switch to matching against partial names
Sarah Hoffmann [Thu, 23 Sep 2021 14:57:24 +0000 (16:57 +0200)]
icu tokenizer: switch to matching against partial names

When matching address parts from addr:* tags against place names,
the address names where so far converted to full names and compared
those to the place names. This can become problematic with the new
ICU tokenizer once we introduce creation of different variants
depending on the place name context. It wouldn't be clear which
variant to produce to get a match, so we would have to create all of
them. To work around this issue, switch to using the partial terms
for matching. This introduces a larger fuzziness between matches but
that shouldn't be a problem because matching is always geographically
restricted.

The search terms created for address parts have a different problem:
they are already created before we even know if they are going to be
used. This can lead to spurious entries in the word table, which slows
down searching. This problem can also be circumvented by using only
partial terms for the search terms. In terms of searching that means
that the address terms would not get the full-word boost, but given
that the case where an address part does not exist as an OSM object
should be the exception, this is likely acceptable.

3 years agoadapt documentation for SQL tokenizer interface
Sarah Hoffmann [Wed, 22 Sep 2021 20:54:14 +0000 (22:54 +0200)]
adapt documentation for SQL tokenizer interface

3 years agomove name matching into tokenizer module
Sarah Hoffmann [Wed, 22 Sep 2021 20:20:02 +0000 (22:20 +0200)]
move name matching into tokenizer module

Instead of requesting the match tokens from the tokenizer
when looking for parent streets/places and address parts,
hand in the saved tokens and ask if they match. This gives
the tokenizer more freedom to decide how name matching
should be done.

3 years agoforce update on rank30 children when place name changes
Sarah Hoffmann [Mon, 27 Sep 2021 09:04:17 +0000 (11:04 +0200)]
force update on rank30 children when place name changes

Name changes may have an effect on parenting. Don't update
surrounding rank30 objects with addr:place tags as this is
potentially too expensive.

3 years agoforce update of surrounding houses when street name changes
Sarah Hoffmann [Mon, 27 Sep 2021 08:20:26 +0000 (10:20 +0200)]
force update of surrounding houses when street name changes

When the street changes its name then this may cause changes
in the parenting of rank-30 objects with an addr:street
tag.

Fixes #2242.

3 years agoUS TIGER data 2021 released
marc tobias [Fri, 24 Sep 2021 22:05:17 +0000 (00:05 +0200)]
US TIGER data 2021 released

3 years agoslightly increase radius to look for postcodes
Sarah Hoffmann [Fri, 24 Sep 2021 21:56:42 +0000 (23:56 +0200)]
slightly increase radius to look for postcodes

3 years agoMerge pull request #2449 from lonvia/address-ranking-spain
Sarah Hoffmann [Fri, 24 Sep 2021 20:48:21 +0000 (22:48 +0200)]
Merge pull request #2449 from lonvia/address-ranking-spain

Adjust address ranks for Spain

3 years agoadjust address ranks for Spain
Sarah Hoffmann [Fri, 24 Sep 2021 15:37:31 +0000 (17:37 +0200)]
adjust address ranks for Spain

Adjusts levels for boundaries according to the list on
https://wiki.openstreetmap.org/wiki/Tag:boundary%3Dadministrative

* no admin_level 5, so drop that from addresses
* admin_level 6 has the province
* admin_level 7 has the county when it exists

Also reranks place=province so that it matches up with
admin_level 6 and introduces place=civil_parish which
is used as a place node for some admin_level=9 boundaries
in Galicia.

3 years agoMerge pull request #2447 from lonvia/fix-dynamic-address-assignment
Sarah Hoffmann [Sun, 19 Sep 2021 13:57:28 +0000 (15:57 +0200)]
Merge pull request #2447 from lonvia/fix-dynamic-address-assignment

Fix dynamic assignment of address parts

3 years agoCI: install locale for CentOS
Sarah Hoffmann [Sun, 19 Sep 2021 11:49:11 +0000 (13:49 +0200)]
CI: install locale for CentOS

3 years agoRemove the installation warning
Sarah Hoffmann [Sun, 19 Sep 2021 11:01:32 +0000 (13:01 +0200)]
Remove the installation warning

Installation has become a lot easier.

3 years agofix dynamic assignment of address parts
Sarah Hoffmann [Sun, 19 Sep 2021 08:54:05 +0000 (10:54 +0200)]
fix dynamic assignment of address parts

A boolean check for dynamic changes of address parts is not
sufficient. The order of choice should be:

 1. an addr:* part matches the name
 2. the address part surrounds the object
 3. the address part was declared as isaddress

The implementation uses a slightly different ordering
to avoid geometry checks unless strictly necessary (isaddress
is false and no matching address).

See #2446.

3 years agoMerge pull request #2440 from lonvia/generic-config-loader
Sarah Hoffmann [Sat, 4 Sep 2021 15:41:15 +0000 (17:41 +0200)]
Merge pull request #2440 from lonvia/generic-config-loader

Add generic loader for YAML configuration files

3 years agofix indent
Sarah Hoffmann [Sat, 4 Sep 2021 08:30:35 +0000 (10:30 +0200)]
fix indent

3 years agouse yaml config loader for country info
Sarah Hoffmann [Fri, 3 Sep 2021 22:22:21 +0000 (00:22 +0200)]
use yaml config loader for country info

3 years agoadd tests for generic YAML config reader
Sarah Hoffmann [Fri, 3 Sep 2021 20:31:30 +0000 (22:31 +0200)]
add tests for generic YAML config reader

3 years agointroduce generic YAML config loader
Sarah Hoffmann [Fri, 3 Sep 2021 16:16:12 +0000 (18:16 +0200)]
introduce generic YAML config loader

Adds a function to the Configuration class to load a YAML
file. This means that searching for the file is generalised
and works the same now for all configuration files. Changes
the search logic, so that it is always possible to have a
custom version of the configuration file in the project
directory.

Move ICU tokenizer to use new load function.

3 years agoMerge pull request #2437 from lonvia/tweak-ranking-searches
Sarah Hoffmann [Fri, 3 Sep 2021 12:16:23 +0000 (14:16 +0200)]
Merge pull request #2437 from lonvia/tweak-ranking-searches

Some more tweaks for search interpretation

3 years agoMerge pull request #2436 from lonvia/country-configuration
Sarah Hoffmann [Fri, 3 Sep 2021 06:55:36 +0000 (08:55 +0200)]
Merge pull request #2436 from lonvia/country-configuration

Move configuration of default languages into a configuration file

3 years agoreduce penalty for special searches by name
Sarah Hoffmann [Thu, 2 Sep 2021 16:13:45 +0000 (18:13 +0200)]
reduce penalty for special searches by name

Additional penalty for special terms with operator None
should only go to near searches. To reduce the number
of produced searches, restrict the none operator to
appear only in conjunction with the name.

3 years agofurther increase penalty on housenumbers without numbers
Sarah Hoffmann [Thu, 2 Sep 2021 16:11:49 +0000 (18:11 +0200)]
further increase penalty on housenumbers without numbers

Make the penality dependent on the length of the token:
no penalty for one letter house numbers and increasing one
for more letters.