]>
git.openstreetmap.org Git - nominatim.git/log
Sarah Hoffmann [Fri, 19 Nov 2021 20:12:17 +0000 (21:12 +0100)]
don't penalize French 'bis' housenumbers
House numbers of the form '9 bis' are usual in France. So
be a bit more lenient before adding penalties to house numbers
with letters in them.
Fixes #2527.
Sarah Hoffmann [Fri, 19 Nov 2021 15:16:30 +0000 (16:16 +0100)]
Merge pull request #2525 from lonvia/fix-replication-indexer
Fix instantiation of indexer for replication
Sarah Hoffmann [Fri, 19 Nov 2021 15:11:32 +0000 (16:11 +0100)]
add a section about moving the database to another machine
Sarah Hoffmann [Fri, 19 Nov 2021 13:47:00 +0000 (14:47 +0100)]
only instantiate indexer once for replication
Also makes sure that indexer object exists everywhere were needed.
See #2518.
Sarah Hoffmann [Thu, 11 Nov 2021 06:42:42 +0000 (07:42 +0100)]
Merge pull request #2517 from lonvia/transliteration-special-chars
ICU: avoid non-alphanumerical characters in transliteration
Sarah Hoffmann [Wed, 10 Nov 2021 16:15:34 +0000 (17:15 +0100)]
make sure housenumbers are properly quoted
Sarah Hoffmann [Wed, 10 Nov 2021 16:14:13 +0000 (17:14 +0100)]
avoid special characters in word tokens
Transliteration should only consist of ASCII letters
and numbers. Avoid any other characters.
Sarah Hoffmann [Wed, 10 Nov 2021 12:27:09 +0000 (13:27 +0100)]
Merge pull request #2516 from lonvia/test-for-website-dir
Better error reporting when API script does not exist
Sarah Hoffmann [Wed, 10 Nov 2021 08:42:49 +0000 (09:42 +0100)]
better error reporting when API script does not exist
Check if the API script exists on the expected location before
running php-cli. This way we can add a useful hint about the
project directory.
Fixes #2513.
Sarah Hoffmann [Sat, 6 Nov 2021 11:11:55 +0000 (12:11 +0100)]
Merge pull request #2511 from lonvia/fix-combination-error-needs-address
Fix boolean combination of NeedsAddress flag
Sarah Hoffmann [Fri, 5 Nov 2021 21:18:37 +0000 (22:18 +0100)]
fix combination of NeedsAddress flag
When dealing with multiple partial terms, only keep the
flag, when all partial terms are so frequent as to need
an address.
Fixes #2510.
Sarah Hoffmann [Mon, 1 Nov 2021 11:14:53 +0000 (12:14 +0100)]
prepare release 4.0.0
Sarah Hoffmann [Tue, 2 Nov 2021 10:09:17 +0000 (11:09 +0100)]
fix typo
Sarah Hoffmann [Mon, 1 Nov 2021 15:12:23 +0000 (16:12 +0100)]
Merge pull request #2502 from lonvia/improve-development-documentation
Extend developer's documentation
Sarah Hoffmann [Mon, 1 Nov 2021 10:04:03 +0000 (11:04 +0100)]
docs: add overview over indexing
Sarah Hoffmann [Fri, 29 Oct 2021 10:03:22 +0000 (12:03 +0200)]
docs: section about database layout
Replaces the import description which basically was
table layout only now.
Sarah Hoffmann [Thu, 28 Oct 2021 13:28:47 +0000 (15:28 +0200)]
Merge pull request #2498 from lonvia/ordering-for-unlisted-place-results
Include unlisted places in ordering by housenumber
Sarah Hoffmann [Thu, 28 Oct 2021 09:33:34 +0000 (11:33 +0200)]
Merge pull request #2497 from lonvia/docs-maintenance
docs: add new maintenance section
Sarah Hoffmann [Thu, 28 Oct 2021 09:27:31 +0000 (11:27 +0200)]
include unlisted places in ordering by housenumber
When ordering results by the fact that they have a housenumber,
also take cases into account where the housenumber is on the
place itself. This may happen when the search includes the name
of the place and the housenumber or for addr:place addresses
where the place is unlisted.
Sarah Hoffmann [Wed, 27 Oct 2021 18:59:45 +0000 (20:59 +0200)]
docs: add new maintenance section
currently used for postcode updates, word count updates and
deleted relations.
Sarah Hoffmann [Wed, 27 Oct 2021 12:40:42 +0000 (14:40 +0200)]
Merge pull request #2495 from lonvia/fix-normalization-in-php
ICU: use correct normalization during search
Sarah Hoffmann [Wed, 27 Oct 2021 08:07:19 +0000 (10:07 +0200)]
ICU: use normalization from config in PHP
The TERM_NORMALIZATION config option is no longer applicable.
That was already documented but not yet implemented.
Sarah Hoffmann [Tue, 26 Oct 2021 15:29:03 +0000 (17:29 +0200)]
bdd: add tests for non-latin scripts
Sarah Hoffmann [Tue, 26 Oct 2021 15:00:43 +0000 (17:00 +0200)]
Merge pull request #2493 from lonvia/handle-frequent-partials
Tune search queries with frequent partial words
Sarah Hoffmann [Tue, 26 Oct 2021 10:07:13 +0000 (12:07 +0200)]
adapt BDD tests to stricter partial search
Sarah Hoffmann [Tue, 26 Oct 2021 09:42:42 +0000 (11:42 +0200)]
do not count words when in reverse-only mode
Sarah Hoffmann [Tue, 26 Oct 2021 08:57:51 +0000 (10:57 +0200)]
further refactor setup to keep function small
Sarah Hoffmann [Tue, 26 Oct 2021 08:28:28 +0000 (10:28 +0200)]
searches for house numbers must have an address
Sarah Hoffmann [Tue, 26 Oct 2021 08:23:55 +0000 (10:23 +0200)]
disallow search for partials without address
Very frequent partial terms take too long to look up and
do not return any valuable results unless the search is
further narrowed down by an address.
Sarah Hoffmann [Tue, 26 Oct 2021 07:37:57 +0000 (09:37 +0200)]
make word count computation part of the import
Accurate word counts are now essential when using
the ICU tokenizer and don't hurt for the legacy one.
Adds about an hour import time.
Sarah Hoffmann [Tue, 26 Oct 2021 08:32:43 +0000 (10:32 +0200)]
actions: move ICU tests into its own run
Sarah Hoffmann [Mon, 25 Oct 2021 19:45:08 +0000 (21:45 +0200)]
Merge pull request #2486 from lonvia/fix-special-phrases
Fix parsing of operator in special phrases
Sarah Hoffmann [Mon, 25 Oct 2021 19:33:27 +0000 (21:33 +0200)]
ICU: add an index over word_ids
Needed for keyword lookup in the details response.
Sarah Hoffmann [Mon, 25 Oct 2021 17:51:20 +0000 (19:51 +0200)]
be case-insensitve about special phrase operator
Sarah Hoffmann [Mon, 25 Oct 2021 17:46:30 +0000 (19:46 +0200)]
fix parsing of operator in special phrases
Because of unstripped input, the operators wouldn't match.
Sarah Hoffmann [Mon, 25 Oct 2021 15:20:42 +0000 (17:20 +0200)]
Merge pull request #2484 from lonvia/fix-index-use
Reverse: add index hints
Sarah Hoffmann [Mon, 25 Oct 2021 14:21:36 +0000 (16:21 +0200)]
Merge pull request #2483 from lonvia/fix-warming
Fix warming for ICU tokenizer
Sarah Hoffmann [Mon, 25 Oct 2021 12:55:15 +0000 (14:55 +0200)]
reverse: add index hints
The fairly complex where condition of idx_placex_geometry_placenode
won't always be matched by the query planner if the condition
part doesn't appear verbatim in the query.
Fixes #2480.
Sarah Hoffmann [Mon, 25 Oct 2021 11:08:16 +0000 (13:08 +0200)]
fix warming for ICU tokenizer
Running the warm-up search requests requires querying
the most frequent words. This must be done via the tokenizer
to honor the different formats of the word table.
Sarah Hoffmann [Mon, 25 Oct 2021 08:13:11 +0000 (10:13 +0200)]
allow relative paths for log files
Sarah Hoffmann [Sun, 24 Oct 2021 08:57:48 +0000 (10:57 +0200)]
Merge pull request #2476 from lonvia/harmonize-configuration-file-settings
Standardize handling of file names in configuration values
Sarah Hoffmann [Fri, 22 Oct 2021 15:32:51 +0000 (17:32 +0200)]
allow relative paths for flatnode file
Sarah Hoffmann [Fri, 22 Oct 2021 14:49:57 +0000 (16:49 +0200)]
switch IMPORT_STYLE to use generic file search
Allows relative paths wrt project directory.
Sarah Hoffmann [Fri, 22 Oct 2021 14:31:33 +0000 (16:31 +0200)]
have ADDRESS_LEVEL_CONFIG use load_sub_configuration
This means that relative paths now are looked up in the
project directory.
Sarah Hoffmann [Fri, 22 Oct 2021 12:41:14 +0000 (14:41 +0200)]
replace NOMINATIM_PHRASE_CONFIG with command line option
Sarah Hoffmann [Thu, 21 Oct 2021 14:38:06 +0000 (16:38 +0200)]
doc: clarify relative paths for tokenizer config
Sarah Hoffmann [Thu, 21 Oct 2021 14:21:58 +0000 (16:21 +0200)]
Merge pull request #2475 from lonvia/catchup-mode
Add catch-up mode to replication and extend documentation for updating
Sarah Hoffmann [Thu, 21 Oct 2021 10:14:47 +0000 (12:14 +0200)]
extend documentation for updating database
Explains the different modes and adds hints for
setting up a systemd job.
Sarah Hoffmann [Wed, 20 Oct 2021 20:05:15 +0000 (22:05 +0200)]
add new replication mode catch-up
This mode gets updates until the server reports no new diffs
anymore.
Also adds additional indexing, when the main indexing step left
a couple of objects to process. This happens only when the
next update is expected to be more than 40min away.
Sarah Hoffmann [Tue, 19 Oct 2021 13:00:26 +0000 (15:00 +0200)]
run Tiger import with parallel threads per default
Sarah Hoffmann [Tue, 19 Oct 2021 12:58:57 +0000 (14:58 +0200)]
Merge pull request #2472 from lonvia/word-count-computation
Fix word count computation for ICU tokenizer
Sarah Hoffmann [Tue, 19 Oct 2021 10:03:48 +0000 (12:03 +0200)]
adapt tests for new word count mechanism
Sarah Hoffmann [Tue, 19 Oct 2021 09:50:06 +0000 (11:50 +0200)]
icu: no longer precompute terms
The ICU analyzer no longer drops frequent partials, so it is no
longer necessary to know the frequencies in advance.
Sarah Hoffmann [Tue, 19 Oct 2021 09:21:16 +0000 (11:21 +0200)]
make word recount a tokenizer-specific function
Sarah Hoffmann [Tue, 19 Oct 2021 07:11:16 +0000 (09:11 +0200)]
Merge pull request #2471 from lonvia/update-install-rules
Reorganise, update and extend documentation
Sarah Hoffmann [Mon, 18 Oct 2021 15:26:14 +0000 (17:26 +0200)]
docs: fix more links
Sarah Hoffmann [Mon, 18 Oct 2021 15:02:52 +0000 (17:02 +0200)]
docs: refer to our new Settings chapter in the import instruchtions
Sarah Hoffmann [Mon, 18 Oct 2021 14:53:24 +0000 (16:53 +0200)]
check and fix all liks in documentation
Sarah Hoffmann [Thu, 14 Oct 2021 12:36:09 +0000 (14:36 +0200)]
add extended documentation of settings
Sarah Hoffmann [Thu, 14 Oct 2021 08:21:52 +0000 (10:21 +0200)]
docs: update overview pages
Sarah Hoffmann [Thu, 14 Oct 2021 08:10:54 +0000 (10:10 +0200)]
docs: move place ranking into customization part
Sarah Hoffmann [Thu, 14 Oct 2021 08:06:01 +0000 (10:06 +0200)]
docs: nominatim-ui has a new place for custom config
Sarah Hoffmann [Tue, 12 Oct 2021 21:07:41 +0000 (23:07 +0200)]
docs: move import style description to customize section
Sarah Hoffmann [Tue, 12 Oct 2021 19:25:13 +0000 (21:25 +0200)]
docs: make customization chapter a separate section
Sarah Hoffmann [Tue, 12 Oct 2021 09:04:44 +0000 (11:04 +0200)]
fix typo
Sarah Hoffmann [Tue, 12 Oct 2021 08:31:18 +0000 (10:31 +0200)]
docs: remove the development warning for ICU tokenizer
Sarah Hoffmann [Tue, 12 Oct 2021 08:25:50 +0000 (10:25 +0200)]
docs: add a warning about using --no-updates with TIGER data
Sarah Hoffmann [Mon, 11 Oct 2021 21:27:38 +0000 (23:27 +0200)]
update and extend man page
Provide extended descriptions for most subcommands.
Sarah Hoffmann [Mon, 11 Oct 2021 20:23:38 +0000 (22:23 +0200)]
rename manual directory to man
Avoids confusion between 'docs' and 'manual'.
Sarah Hoffmann [Mon, 11 Oct 2021 20:10:54 +0000 (22:10 +0200)]
add munin scipts and ICU subrules to installation
Sarah Hoffmann [Fri, 15 Oct 2021 16:20:43 +0000 (18:20 +0200)]
Merge pull request #2469 from lonvia/fix-tablespace-assignment
Fix template expressions for tablespaces
Sarah Hoffmann [Fri, 15 Oct 2021 13:07:43 +0000 (15:07 +0200)]
fix template expressions for tablespaces
Sarah Hoffmann [Mon, 11 Oct 2021 17:22:15 +0000 (19:22 +0200)]
Merge pull request #2450 from mtmail/tiger-data-2021
US TIGER data 2021 released
Sarah Hoffmann [Mon, 11 Oct 2021 08:48:44 +0000 (10:48 +0200)]
Merge pull request #2465 from lonvia/use-spgist-index
Use SP-GIST for building index
Sarah Hoffmann [Sun, 10 Oct 2021 19:58:43 +0000 (21:58 +0200)]
remove outdated country_languages.php
Sarah Hoffmann [Sun, 10 Oct 2021 12:23:08 +0000 (14:23 +0200)]
add recommendation for Postgis 3+
Sarah Hoffmann [Sun, 10 Oct 2021 12:17:03 +0000 (14:17 +0200)]
use SP-GIST index for building index where available
Point-in-polygon queries are much faster with a SP-GIST geometry
index, so use that for the index used to check if a housenumber
is inside a building.
Only available with Postgis 3. There is an automatic fallback to
GIST for Postgis 2.
Sarah Hoffmann [Sat, 9 Oct 2021 12:41:09 +0000 (14:41 +0200)]
Merge pull request #2460 from lonvia/multiple-analyzers
Add support for multiple token analyzers
Sarah Hoffmann [Thu, 7 Oct 2021 09:55:53 +0000 (11:55 +0200)]
add documentation for new configuration of ICU tokenizer
Sarah Hoffmann [Thu, 7 Oct 2021 07:49:13 +0000 (09:49 +0200)]
fix argument description for check_database
Sarah Hoffmann [Wed, 6 Oct 2021 15:03:37 +0000 (17:03 +0200)]
reorganize and complete tests around generic token analysis
Sarah Hoffmann [Wed, 6 Oct 2021 10:29:25 +0000 (12:29 +0200)]
add tests for sanitizer tagging language
Sarah Hoffmann [Tue, 5 Oct 2021 15:18:10 +0000 (17:18 +0200)]
apply variants by languages
Adds a tagger for names by language so that the analyzer of that
language is used. Thus variants are now only applied to names
in the specific language and only tag name tags, no longer to
reference-like tags.
Sarah Hoffmann [Tue, 5 Oct 2021 12:10:32 +0000 (14:10 +0200)]
use analyser provided in the 'analyzer' property
Implements per-name choice of analyzer. If a non-default
analyzer is choosen, then the 'word' identifier is extended
with the name of the ana;yzer, so that we still have unique
items.
Sarah Hoffmann [Tue, 5 Oct 2021 08:29:36 +0000 (10:29 +0200)]
remove support for properties on variants
Those are not going to be used in the near future, so no need to
carry that code around just now.
Sarah Hoffmann [Tue, 5 Oct 2021 08:20:08 +0000 (10:20 +0200)]
precompute replacements while loading configuration
Sarah Hoffmann [Mon, 4 Oct 2021 16:31:58 +0000 (18:31 +0200)]
move parsing of token analysis config to analyzer
Adds a second callback for the analyzer which is responsible
for parsing the configuration rules and converting it to
whatever format necessary. This way, each analyzer implementation
can define its own configuration rules.
Sarah Hoffmann [Mon, 4 Oct 2021 15:34:30 +0000 (17:34 +0200)]
make token analyzers configurable modules
Adds a mandatory section 'analyzer' to the token-analysis entries
which define, which analyser to use. Currently there is exactly
one, generic, which implements the former ICUNameProcessor.
Sarah Hoffmann [Mon, 4 Oct 2021 14:40:28 +0000 (16:40 +0200)]
extend ICU config to accomodate multiple analysers
Adds parsing of multiple variant lists from the configuration.
Every entry except one must have a unique 'id' paramter to
distinguish the entries. The entry without id is considered
the default. Currently only the list without an id is used
for analysis.
Sarah Hoffmann [Mon, 4 Oct 2021 09:56:54 +0000 (11:56 +0200)]
move flatten_config_list into config module
For general usage by other modules.
Sarah Hoffmann [Fri, 1 Oct 2021 19:53:34 +0000 (21:53 +0200)]
Merge pull request #2458 from lonvia/add-tokenizer-preprocessing
Add a "sanitation" step for name and address tags before token processing
Sarah Hoffmann [Fri, 1 Oct 2021 08:51:41 +0000 (10:51 +0200)]
replace test variable for PG env tests
'tty' was removed in PG14 and causes an error.
Sarah Hoffmann [Fri, 1 Oct 2021 07:50:17 +0000 (09:50 +0200)]
add unit tests for new sanatizer functions
Sarah Hoffmann [Thu, 30 Sep 2021 19:30:13 +0000 (21:30 +0200)]
introduce sanitizer step before token analysis
Sanatizer functions allow to transform name and address tags before
they are handed to the tokenizer. Theses transformations are visible
only for the tokenizer and thus only have an influence on the
search terms and address match terms for a place.
Currently two sanitizers are implemented which are responsible for
splitting names with multiple values and removing bracket additions.
Both was previously hard-coded in the tokenizer.
Sarah Hoffmann [Wed, 29 Sep 2021 15:37:04 +0000 (17:37 +0200)]
unify ICUNameProcessorRules and ICURuleLoader
There is no need for the additional layer of indirection that
the ICUNameProcessorRules class adds. The ICURuleLoader can
fill the database properties directly.
Sarah Hoffmann [Wed, 29 Sep 2021 12:16:09 +0000 (14:16 +0200)]
fix typo
Sarah Hoffmann [Wed, 29 Sep 2021 09:54:14 +0000 (11:54 +0200)]
export more data for the tokenizer name preparation
Adds class, type, country and rank to the exported information
and removes the rather odd hack for countries. Whether a place
represents a country boundary can now be computed by the tokenizer.
Sarah Hoffmann [Wed, 29 Sep 2021 08:37:54 +0000 (10:37 +0200)]
add wrapper class for place data passed to tokenizer
This is mostly for convenience and documentation purposes.
Sarah Hoffmann [Tue, 28 Sep 2021 09:21:08 +0000 (11:21 +0200)]
Merge pull request #2455 from lonvia/adjust-address-levels-slovakia
Adjust address levels for boundaries in Slovakia
Sarah Hoffmann [Tue, 28 Sep 2021 07:45:15 +0000 (09:45 +0200)]
Merge pull request #2454 from lonvia/sort-out-token-assignment-in-sql
ICU tokenizer: switch match method to using partial terms