git.openstreetmap.org Git - nominatim.git/log

disallow search for partials without address

Very frequent partial terms take too long to look up and
do not return any valuable results unless the search is
further narrowed down by an address.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 26 Oct 2021 07:37:57 +0000 (09:37 +0200)]

make word count computation part of the import

Accurate word counts are now essential when using
the ICU tokenizer and don't hurt for the legacy one.

Adds about an hour import time.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 26 Oct 2021 08:32:43 +0000 (10:32 +0200)]

actions: move ICU tests into its own run

commit | commitdiff | tree

Sarah Hoffmann [Mon, 25 Oct 2021 19:46:18 +0000 (21:46 +0200)]

Merge remote-tracking branch 'upstream/master'

commit | commitdiff | tree

Sarah Hoffmann [Mon, 25 Oct 2021 19:45:08 +0000 (21:45 +0200)]

Merge pull request #2486 from lonvia/fix-special-phrases

Fix parsing of operator in special phrases

commit | commitdiff | tree

Sarah Hoffmann [Mon, 25 Oct 2021 19:33:27 +0000 (21:33 +0200)]

ICU: add an index over word_ids

Needed for keyword lookup in the details response.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 25 Oct 2021 18:46:01 +0000 (20:46 +0200)]

ICU: additional ranking by matching of normalised term

Keep track of normalised word for tokens and then recheck
against normalized form in database to exclude non-matching
script.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 25 Oct 2021 17:51:20 +0000 (19:51 +0200)]

be case-insensitve about special phrase operator

commit | commitdiff | tree

Sarah Hoffmann [Mon, 25 Oct 2021 17:46:30 +0000 (19:46 +0200)]

fix parsing of operator in special phrases

Because of unstripped input, the operators wouldn't match.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 25 Oct 2021 15:21:10 +0000 (17:21 +0200)]

Merge remote-tracking branch 'upstream/master'

commit | commitdiff | tree

Sarah Hoffmann [Mon, 25 Oct 2021 15:20:42 +0000 (17:20 +0200)]

Merge pull request #2484 from lonvia/fix-index-use

Reverse: add index hints

commit | commitdiff | tree

Sarah Hoffmann [Mon, 25 Oct 2021 14:21:36 +0000 (16:21 +0200)]

Merge pull request #2483 from lonvia/fix-warming

Fix warming for ICU tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Mon, 25 Oct 2021 12:55:15 +0000 (14:55 +0200)]

reverse: add index hints

The fairly complex where condition of idx_placex_geometry_placenode
won't always be matched by the query planner if the condition
part doesn't appear verbatim in the query.

Fixes #2480.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 25 Oct 2021 11:08:16 +0000 (13:08 +0200)]

fix warming for ICU tokenizer

Running the warm-up search requests requires querying
the most frequent words. This must be done via the tokenizer
to honor the different formats of the word table.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 25 Oct 2021 08:13:11 +0000 (10:13 +0200)]

allow relative paths for log files

commit | commitdiff | tree

Sarah Hoffmann [Sun, 24 Oct 2021 08:57:48 +0000 (10:57 +0200)]

Merge pull request #2476 from lonvia/harmonize-configuration-file-settings

Standardize handling of file names in configuration values

commit | commitdiff | tree

Sarah Hoffmann [Fri, 22 Oct 2021 15:32:51 +0000 (17:32 +0200)]

allow relative paths for flatnode file

commit | commitdiff | tree

Sarah Hoffmann [Fri, 22 Oct 2021 14:49:57 +0000 (16:49 +0200)]

switch IMPORT_STYLE to use generic file search

Allows relative paths wrt project directory.

commit | commitdiff | tree

Sarah Hoffmann [Fri, 22 Oct 2021 14:31:33 +0000 (16:31 +0200)]

have ADDRESS_LEVEL_CONFIG use load_sub_configuration

This means that relative paths now are looked up in the
project directory.

commit | commitdiff | tree

Sarah Hoffmann [Fri, 22 Oct 2021 12:41:14 +0000 (14:41 +0200)]

replace NOMINATIM_PHRASE_CONFIG with command line option

commit | commitdiff | tree

Sarah Hoffmann [Thu, 21 Oct 2021 14:38:06 +0000 (16:38 +0200)]

doc: clarify relative paths for tokenizer config

commit | commitdiff | tree

Sarah Hoffmann [Thu, 21 Oct 2021 14:21:58 +0000 (16:21 +0200)]

Merge pull request #2475 from lonvia/catchup-mode

Add catch-up mode to replication and extend documentation for updating

commit | commitdiff | tree

Sarah Hoffmann [Thu, 21 Oct 2021 10:14:47 +0000 (12:14 +0200)]

extend documentation for updating database

Explains the different modes and adds hints for
setting up a systemd job.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 20 Oct 2021 20:05:15 +0000 (22:05 +0200)]

add new replication mode catch-up

This mode gets updates until the server reports no new diffs
anymore.

Also adds additional indexing, when the main indexing step left
a couple of objects to process. This happens only when the
next update is expected to be more than 40min away.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 19 Oct 2021 13:07:17 +0000 (15:07 +0200)]

Merge remote-tracking branch 'upstream/master'

commit | commitdiff | tree

Sarah Hoffmann [Tue, 19 Oct 2021 13:00:26 +0000 (15:00 +0200)]

run Tiger import with parallel threads per default

commit | commitdiff | tree

Sarah Hoffmann [Tue, 19 Oct 2021 12:58:57 +0000 (14:58 +0200)]

Merge pull request #2472 from lonvia/word-count-computation

Fix word count computation for ICU tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Tue, 19 Oct 2021 10:03:48 +0000 (12:03 +0200)]

adapt tests for new word count mechanism

commit | commitdiff | tree

Sarah Hoffmann [Tue, 19 Oct 2021 09:50:06 +0000 (11:50 +0200)]

icu: no longer precompute terms

The ICU analyzer no longer drops frequent partials, so it is no
longer necessary to know the frequencies in advance.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 19 Oct 2021 09:21:16 +0000 (11:21 +0200)]

make word recount a tokenizer-specific function

commit | commitdiff | tree

Sarah Hoffmann [Tue, 19 Oct 2021 07:11:16 +0000 (09:11 +0200)]

Merge pull request #2471 from lonvia/update-install-rules

Reorganise, update and extend documentation

commit | commitdiff | tree

Sarah Hoffmann [Mon, 18 Oct 2021 15:26:14 +0000 (17:26 +0200)]

docs: fix more links

commit | commitdiff | tree

Sarah Hoffmann [Mon, 18 Oct 2021 15:02:52 +0000 (17:02 +0200)]

docs: refer to our new Settings chapter in the import instruchtions

commit | commitdiff | tree

Sarah Hoffmann [Mon, 18 Oct 2021 14:53:24 +0000 (16:53 +0200)]

check and fix all liks in documentation

commit | commitdiff | tree

Sarah Hoffmann [Thu, 14 Oct 2021 12:36:09 +0000 (14:36 +0200)]

add extended documentation of settings

commit | commitdiff | tree

Sarah Hoffmann [Thu, 14 Oct 2021 08:21:52 +0000 (10:21 +0200)]

docs: update overview pages

commit | commitdiff | tree

Sarah Hoffmann [Thu, 14 Oct 2021 08:10:54 +0000 (10:10 +0200)]

docs: move place ranking into customization part

commit | commitdiff | tree

Sarah Hoffmann [Thu, 14 Oct 2021 08:06:01 +0000 (10:06 +0200)]

docs: nominatim-ui has a new place for custom config

commit | commitdiff | tree

Sarah Hoffmann [Tue, 12 Oct 2021 21:07:41 +0000 (23:07 +0200)]

docs: move import style description to customize section

commit | commitdiff | tree

Sarah Hoffmann [Tue, 12 Oct 2021 19:25:13 +0000 (21:25 +0200)]

docs: make customization chapter a separate section

commit | commitdiff | tree

Sarah Hoffmann [Tue, 12 Oct 2021 09:04:44 +0000 (11:04 +0200)]

fix typo

commit | commitdiff | tree

Sarah Hoffmann [Tue, 12 Oct 2021 08:31:18 +0000 (10:31 +0200)]

docs: remove the development warning for ICU tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Tue, 12 Oct 2021 08:25:50 +0000 (10:25 +0200)]

docs: add a warning about using --no-updates with TIGER data

commit | commitdiff | tree

Sarah Hoffmann [Mon, 11 Oct 2021 21:27:38 +0000 (23:27 +0200)]

update and extend man page

Provide extended descriptions for most subcommands.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 11 Oct 2021 20:23:38 +0000 (22:23 +0200)]

rename manual directory to man

Avoids confusion between 'docs' and 'manual'.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 11 Oct 2021 20:10:54 +0000 (22:10 +0200)]

add munin scipts and ICU subrules to installation

commit | commitdiff | tree

Sarah Hoffmann [Fri, 15 Oct 2021 16:21:13 +0000 (18:21 +0200)]

Merge remote-tracking branch 'upstream/master'

commit | commitdiff | tree

Sarah Hoffmann [Fri, 15 Oct 2021 16:20:43 +0000 (18:20 +0200)]

Merge pull request #2469 from lonvia/fix-tablespace-assignment

Fix template expressions for tablespaces

commit | commitdiff | tree

Sarah Hoffmann [Fri, 15 Oct 2021 13:07:43 +0000 (15:07 +0200)]

fix template expressions for tablespaces

commit | commitdiff | tree

Sarah Hoffmann [Mon, 11 Oct 2021 17:22:15 +0000 (19:22 +0200)]

Merge pull request #2450 from mtmail/tiger-data-2021

US TIGER data 2021 released

commit | commitdiff | tree

Sarah Hoffmann [Mon, 11 Oct 2021 08:56:57 +0000 (10:56 +0200)]

Merge remote-tracking branch 'upstream/master'

commit | commitdiff | tree

Sarah Hoffmann [Mon, 11 Oct 2021 08:48:44 +0000 (10:48 +0200)]

Merge pull request #2465 from lonvia/use-spgist-index

Use SP-GIST for building index

commit | commitdiff | tree

Sarah Hoffmann [Sun, 10 Oct 2021 19:58:43 +0000 (21:58 +0200)]

remove outdated country_languages.php

commit | commitdiff | tree

Sarah Hoffmann [Sun, 10 Oct 2021 12:23:08 +0000 (14:23 +0200)]

add recommendation for Postgis 3+

commit | commitdiff | tree

Sarah Hoffmann [Sun, 10 Oct 2021 12:17:03 +0000 (14:17 +0200)]

use SP-GIST index for building index where available

Point-in-polygon queries are much faster with a SP-GIST geometry
index, so use that for the index used to check if a housenumber
is inside a building.

Only available with Postgis 3. There is an automatic fallback to
GIST for Postgis 2.

commit | commitdiff | tree

Sarah Hoffmann [Sat, 9 Oct 2021 12:41:09 +0000 (14:41 +0200)]

Merge pull request #2460 from lonvia/multiple-analyzers

Add support for multiple token analyzers

commit | commitdiff | tree

Sarah Hoffmann [Thu, 7 Oct 2021 09:55:53 +0000 (11:55 +0200)]

add documentation for new configuration of ICU tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Thu, 7 Oct 2021 07:49:13 +0000 (09:49 +0200)]

fix argument description for check_database

commit | commitdiff | tree

Sarah Hoffmann [Wed, 6 Oct 2021 15:03:37 +0000 (17:03 +0200)]

reorganize and complete tests around generic token analysis

commit | commitdiff | tree

Sarah Hoffmann [Wed, 6 Oct 2021 10:29:25 +0000 (12:29 +0200)]

add tests for sanitizer tagging language

commit | commitdiff | tree

Sarah Hoffmann [Tue, 5 Oct 2021 15:18:10 +0000 (17:18 +0200)]

apply variants by languages

Adds a tagger for names by language so that the analyzer of that
language is used. Thus variants are now only applied to names
in the specific language and only tag name tags, no longer to
reference-like tags.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 5 Oct 2021 12:10:32 +0000 (14:10 +0200)]

use analyser provided in the 'analyzer' property

Implements per-name choice of analyzer. If a non-default
analyzer is choosen, then the 'word' identifier is extended
with the name of the ana;yzer, so that we still have unique
items.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 5 Oct 2021 08:29:36 +0000 (10:29 +0200)]

remove support for properties on variants

Those are not going to be used in the near future, so no need to
carry that code around just now.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 5 Oct 2021 08:20:08 +0000 (10:20 +0200)]

precompute replacements while loading configuration

commit | commitdiff | tree

Sarah Hoffmann [Mon, 4 Oct 2021 16:31:58 +0000 (18:31 +0200)]

move parsing of token analysis config to analyzer

Adds a second callback for the analyzer which is responsible
for parsing the configuration rules and converting it to
whatever format necessary. This way, each analyzer implementation
can define its own configuration rules.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 4 Oct 2021 15:34:30 +0000 (17:34 +0200)]

make token analyzers configurable modules

Adds a mandatory section 'analyzer' to the token-analysis entries
which define, which analyser to use. Currently there is exactly
one, generic, which implements the former ICUNameProcessor.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 4 Oct 2021 14:40:28 +0000 (16:40 +0200)]

extend ICU config to accomodate multiple analysers

Adds parsing of multiple variant lists from the configuration.
Every entry except one must have a unique 'id' paramter to
distinguish the entries. The entry without id is considered
the default. Currently only the list without an id is used
for analysis.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 4 Oct 2021 09:56:54 +0000 (11:56 +0200)]

move flatten_config_list into config module

For general usage by other modules.

commit | commitdiff | tree

Sarah Hoffmann [Fri, 1 Oct 2021 19:53:34 +0000 (21:53 +0200)]

Merge pull request #2458 from lonvia/add-tokenizer-preprocessing

Add a "sanitation" step for name and address tags before token processing

commit | commitdiff | tree

Sarah Hoffmann [Fri, 1 Oct 2021 08:51:41 +0000 (10:51 +0200)]

replace test variable for PG env tests

'tty' was removed in PG14 and causes an error.

commit | commitdiff | tree

Sarah Hoffmann [Fri, 1 Oct 2021 07:50:17 +0000 (09:50 +0200)]

add unit tests for new sanatizer functions

commit | commitdiff | tree

Sarah Hoffmann [Thu, 30 Sep 2021 19:30:13 +0000 (21:30 +0200)]

introduce sanitizer step before token analysis

Sanatizer functions allow to transform name and address tags before
they are handed to the tokenizer. Theses transformations are visible
only for the tokenizer and thus only have an influence on the
search terms and address match terms for a place.

Currently two sanitizers are implemented which are responsible for
splitting names with multiple values and removing bracket additions.
Both was previously hard-coded in the tokenizer.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 29 Sep 2021 15:37:04 +0000 (17:37 +0200)]

unify ICUNameProcessorRules and ICURuleLoader

There is no need for the additional layer of indirection that
the ICUNameProcessorRules class adds. The ICURuleLoader can
fill the database properties directly.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 29 Sep 2021 12:16:09 +0000 (14:16 +0200)]

fix typo

commit | commitdiff | tree

Sarah Hoffmann [Wed, 29 Sep 2021 09:54:14 +0000 (11:54 +0200)]

export more data for the tokenizer name preparation

Adds class, type, country and rank to the exported information
and removes the rather odd hack for countries. Whether a place
represents a country boundary can now be computed by the tokenizer.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 29 Sep 2021 08:37:54 +0000 (10:37 +0200)]

add wrapper class for place data passed to tokenizer

This is mostly for convenience and documentation purposes.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 28 Sep 2021 09:27:00 +0000 (11:27 +0200)]

Merge remote-tracking branch 'upstream/master'

commit | commitdiff | tree

Sarah Hoffmann [Tue, 28 Sep 2021 09:21:08 +0000 (11:21 +0200)]

Merge pull request #2455 from lonvia/adjust-address-levels-slovakia

Adjust address levels for boundaries in Slovakia

commit | commitdiff | tree

Sarah Hoffmann [Tue, 28 Sep 2021 07:45:15 +0000 (09:45 +0200)]

Merge pull request #2454 from lonvia/sort-out-token-assignment-in-sql

ICU tokenizer: switch match method to using partial terms

commit | commitdiff | tree

Sarah Hoffmann [Mon, 27 Sep 2021 21:32:11 +0000 (23:32 +0200)]

adjust address levels for boundaries in Slovakia

Levels choosen according to OSM wiki. Mainly moves admin_level 6
to county level and admin_level 8 to city/town level. Higher
levels are adjusted accordingly.

Fixes #2453.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 27 Sep 2021 15:36:23 +0000 (17:36 +0200)]

adapt tests to new ICU address token handling

commit | commitdiff | tree

Sarah Hoffmann [Mon, 27 Sep 2021 12:58:43 +0000 (14:58 +0200)]

remove unused parameter

commit | commitdiff | tree

Sarah Hoffmann [Mon, 27 Sep 2021 12:56:14 +0000 (14:56 +0200)]

Merge remote-tracking branch 'upstream/master'

commit | commitdiff | tree

Sarah Hoffmann [Mon, 27 Sep 2021 12:55:50 +0000 (14:55 +0200)]

Merge pull request #2452 from lonvia/update-houses-on-street-name-change

Force update of surrounding houses when street or place name changes

commit | commitdiff | tree

Sarah Hoffmann [Thu, 23 Sep 2021 14:57:24 +0000 (16:57 +0200)]

icu tokenizer: switch to matching against partial names

When matching address parts from addr:* tags against place names,
the address names where so far converted to full names and compared
those to the place names. This can become problematic with the new
ICU tokenizer once we introduce creation of different variants
depending on the place name context. It wouldn't be clear which
variant to produce to get a match, so we would have to create all of
them. To work around this issue, switch to using the partial terms
for matching. This introduces a larger fuzziness between matches but
that shouldn't be a problem because matching is always geographically
restricted.

The search terms created for address parts have a different problem:
they are already created before we even know if they are going to be
used. This can lead to spurious entries in the word table, which slows
down searching. This problem can also be circumvented by using only
partial terms for the search terms. In terms of searching that means
that the address terms would not get the full-word boost, but given
that the case where an address part does not exist as an OSM object
should be the exception, this is likely acceptable.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 22 Sep 2021 20:54:14 +0000 (22:54 +0200)]

adapt documentation for SQL tokenizer interface

commit | commitdiff | tree

Sarah Hoffmann [Wed, 22 Sep 2021 20:20:02 +0000 (22:20 +0200)]

move name matching into tokenizer module

Instead of requesting the match tokens from the tokenizer
when looking for parent streets/places and address parts,
hand in the saved tokens and ask if they match. This gives
the tokenizer more freedom to decide how name matching
should be done.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 27 Sep 2021 09:04:17 +0000 (11:04 +0200)]

force update on rank30 children when place name changes

Name changes may have an effect on parenting. Don't update
surrounding rank30 objects with addr:place tags as this is
potentially too expensive.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 27 Sep 2021 08:20:26 +0000 (10:20 +0200)]

force update of surrounding houses when street name changes

When the street changes its name then this may cause changes
in the parenting of rank-30 objects with an addr:street
tag.

Fixes #2242.

commit | commitdiff | tree

marc tobias [Fri, 24 Sep 2021 22:05:17 +0000 (00:05 +0200)]

US TIGER data 2021 released

commit | commitdiff | tree

Sarah Hoffmann [Fri, 24 Sep 2021 21:58:00 +0000 (23:58 +0200)]

Merge remote-tracking branch 'upstream/master'

commit | commitdiff | tree

Sarah Hoffmann [Fri, 24 Sep 2021 21:56:42 +0000 (23:56 +0200)]

slightly increase radius to look for postcodes

commit | commitdiff | tree

Sarah Hoffmann [Fri, 24 Sep 2021 20:49:14 +0000 (22:49 +0200)]

Merge remote-tracking branch 'upstream/master'

Open Source search based on OpenStreetMap data

RSS Atom