git.openstreetmap.org Git - nominatim.git/log

]> git.openstreetmap.org Git - nominatim.git/log

Sarah Hoffmann [Wed, 2 Jun 2021 18:58:14 +0000 (20:58 +0200)]

Merge pull request #2357 from lonvia/legacy-tokenizer-fix-word-entries

Fix insertion of special terms and countries into word table

commit | commitdiff | tree

Sarah Hoffmann [Wed, 2 Jun 2021 15:37:27 +0000 (17:37 +0200)]

fix insertion of special terms and countries into word table

Special terms need to be prefixed by a space because they are
full terms.

For countries avoid duplicate entries of word tokens.

Adds tests for adding country terms.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 2 Jun 2021 14:25:26 +0000 (16:25 +0200)]

Merge pull request #2356 from lonvia/freeze-after-import

Call freeze after running and non-updateable import

commit | commitdiff | tree

Sarah Hoffmann [Wed, 2 Jun 2021 14:11:29 +0000 (16:11 +0200)]

docs: reload SQL when migrating to 3.6

SQL functions must always be reloaded when updating the software.
All other updates included the instruction as part of some other
migration. From 3.7 on it will happen as part of the migration
command.

Fixes #2335.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 2 Jun 2021 09:08:48 +0000 (11:08 +0200)]

call freeze after running and non-updateable import

Some of the tables will have already been removed but
the tables for indexing are still there and should be
dropped.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 26 May 2021 09:47:08 +0000 (11:47 +0200)]

commit changes to replication log table

Fixes #2350.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 26 May 2021 09:04:02 +0000 (11:04 +0200)]

always compute guessed postcode for POIs from centroid

When guessing postcodes from the area, only postcodes within
that area are accepted. For POIs that is usually not what we
want as the postcode would have to be within a house for
example.

Fixes #2301.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 25 May 2021 18:43:44 +0000 (20:43 +0200)]

Merge pull request #2349 from lonvia/fix-website-refresh

Only initialise tokenizer for refresh functions where needed

commit | commitdiff | tree

Sarah Hoffmann [Tue, 25 May 2021 17:16:22 +0000 (19:16 +0200)]

only initialise tokenizer for refresh functions where needed

Fixes #2347.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 24 May 2021 15:41:38 +0000 (17:41 +0200)]

Merge pull request #2346 from lonvia/words-vs-tokens

Cleanup use of partial words in legacy tokenizers

commit | commitdiff | tree

Sarah Hoffmann [Mon, 24 May 2021 08:29:21 +0000 (10:29 +0200)]

add tests for new full name computation with ICU

commit | commitdiff | tree

Sarah Hoffmann [Sun, 23 May 2021 21:58:58 +0000 (23:58 +0200)]

reorganize keyword creation for legacy tokenizer

- only save partial words without internal spaces
- consider comma and semicolon a separator of full words
- consider parts before an opening bracket a full word
(but not the part after the bracket)

Fixes #244.

commit | commitdiff | tree

Sarah Hoffmann [Sun, 23 May 2021 21:08:11 +0000 (23:08 +0200)]

use make_keywords for place search terms also

Ensures that place indeed uses the same search names as other
names.

commit | commitdiff | tree

Sarah Hoffmann [Sun, 23 May 2021 20:13:03 +0000 (22:13 +0200)]

always ignore multi term partials in search

Partial terms should only ever consist of one word. Ignore
any other, they are a leftover from inefficient word index
builts.

commit | commitdiff | tree

Sarah Hoffmann [Sat, 22 May 2021 08:36:35 +0000 (10:36 +0200)]

Merge pull request #2342 from lonvia/icu-tokenizer-ci

Add BDD tests with icu tokenizer to CI runs

commit | commitdiff | tree

Sarah Hoffmann [Fri, 21 May 2021 20:40:22 +0000 (22:40 +0200)]

CI: run BDD tests with legacy_icu tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Fri, 21 May 2021 20:39:56 +0000 (22:39 +0200)]

enable Tiger BDD API test for legacy_icu

commit | commitdiff | tree

Sarah Hoffmann [Thu, 20 May 2021 15:30:30 +0000 (17:30 +0200)]

Merge pull request #2341 from lonvia/cleanup-python-tests

Cleanup and linting of python tests

commit | commitdiff | tree

Sarah Hoffmann [Thu, 20 May 2021 08:26:23 +0000 (10:26 +0200)]

Merge pull request #2337 from mogita/fix/invalid-query-string

fix: add the missing question mark

commit | commitdiff | tree

Sarah Hoffmann [Wed, 19 May 2021 21:07:39 +0000 (23:07 +0200)]

test: fix linting errors

commit | commitdiff | tree

Sarah Hoffmann [Wed, 19 May 2021 15:37:03 +0000 (17:37 +0200)]

test: more use of table_factory

commit | commitdiff | tree

Sarah Hoffmann [Wed, 19 May 2021 14:42:35 +0000 (16:42 +0200)]

test: avoid use of tempfile module

Use the tmp_path fixture instead which provides automatic
cleanup.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 19 May 2021 14:03:54 +0000 (16:03 +0200)]

test: use src_dir fixture instead of self-computed paths

commit | commitdiff | tree

Sarah Hoffmann [Wed, 19 May 2021 10:11:04 +0000 (12:11 +0200)]

test: replace raw execute() with fixture code where possible

commit | commitdiff | tree

Sarah Hoffmann [Wed, 19 May 2021 08:51:10 +0000 (10:51 +0200)]

test: use table_rows() and execute_values() where possible

Some uses of scalar() could also be replaced with convenience
functions from the word table mock.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 19 May 2021 08:30:36 +0000 (10:30 +0200)]

test: move Testingcursor into separate class

Also adds more convenience functions: counting with a where
statement and a wrapper to execute_values().

commit | commitdiff | tree

mogita [Wed, 19 May 2021 05:35:15 +0000 (13:35 +0800)]

fix: add the missing question mark

commit | commitdiff | tree

Sarah Hoffmann [Tue, 18 May 2021 21:00:10 +0000 (23:00 +0200)]

Merge pull request #2336 from lonvia/do-not-mask-error-when-loading-tokenizer

Do not hide errors when importing tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Tue, 18 May 2021 20:58:25 +0000 (22:58 +0200)]

Merge pull request #2321 from AntoJvlt/csv-import-special-phrases

CSV import for special phrases and loader refactoring

commit | commitdiff | tree

AntoJvlt [Mon, 17 May 2021 21:00:22 +0000 (23:00 +0200)]

Documentation update and small code fixes

commit | commitdiff | tree

Sarah Hoffmann [Tue, 18 May 2021 14:28:21 +0000 (16:28 +0200)]

do not hide errors when importing tokenizer

Explicitly check for the tokenizer source file to check that
the name is correct. We can't use the import error for that
because it hides other import errors like a missing
library.

Fixes #2327.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 18 May 2021 09:30:58 +0000 (11:30 +0200)]

Merge pull request #2332 from lonvia/fix-keyword-details

Always use object type for details keywords

commit | commitdiff | tree

Sarah Hoffmann [Mon, 17 May 2021 14:36:32 +0000 (16:36 +0200)]

always use object type for details keywords

When name and address is empty, the keywords field in the response
of the details API would be an array because that is what PHP's
json_encode defaults to with empty array(). This default can only
be changed globally per json_encode call and that might cause
unintended colleteral damage. Work around the issue by making
name and address an empty array instead of keywords.

Fixes #2329.

commit | commitdiff | tree

AntoJvlt [Mon, 17 May 2021 11:52:35 +0000 (13:52 +0200)]

Resolve conflicts

commit | commitdiff | tree

AntoJvlt [Mon, 17 May 2021 10:53:58 +0000 (12:53 +0200)]

Special phrases documentation updated

commit | commitdiff | tree

AntoJvlt [Mon, 17 May 2021 10:40:50 +0000 (12:40 +0200)]

Added --no-replace command for special phrases importation and added corresponding tests

commit | commitdiff | tree

AntoJvlt [Sun, 16 May 2021 14:59:12 +0000 (16:59 +0200)]

Code cleaning and SPLoader deleted

commit | commitdiff | tree

AntoJvlt [Sun, 16 May 2021 13:32:22 +0000 (15:32 +0200)]

Add tests for the new SPWikiLoader and SPCsvLoader

commit | commitdiff | tree

Sarah Hoffmann [Fri, 14 May 2021 08:40:22 +0000 (10:40 +0200)]

Merge pull request #2323 from darkshredder/disable-search-reverse-only

Feat: Disabled search API for --reverse-only imports

commit | commitdiff | tree

Sarah Hoffmann [Fri, 14 May 2021 07:58:50 +0000 (09:58 +0200)]

Merge pull request #2328 from lonvia/convert-tiger-to-csv

Switch external Tiger data to CSV format

commit | commitdiff | tree

Sarah Hoffmann [Fri, 14 May 2021 07:44:10 +0000 (09:44 +0200)]

install default settings for legacy_icu tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Thu, 13 May 2021 21:39:01 +0000 (23:39 +0200)]

adapt documentation to use Tiger CSV dump

commit | commitdiff | tree

Sarah Hoffmann [Thu, 13 May 2021 21:37:51 +0000 (23:37 +0200)]

adapt tests to new TIGER CSV format

commit | commitdiff | tree

Sarah Hoffmann [Thu, 13 May 2021 20:11:41 +0000 (22:11 +0200)]

use tokenizer during Tiger data import

This also changes the required import format to CSV.

commit | commitdiff | tree

Darkshredder [Wed, 12 May 2021 21:44:37 +0000 (03:14 +0530)]

feat: Added reverse-only-search validation

commit | commitdiff | tree

Sarah Hoffmann [Thu, 13 May 2021 20:09:56 +0000 (22:09 +0200)]

Merge pull request #2326 from lonvia/wokerpool-for-tiger-data

Use WorkerPool when importing Tiger data

commit | commitdiff | tree

Sarah Hoffmann [Thu, 13 May 2021 18:16:30 +0000 (20:16 +0200)]

use WorkerPool for Tiger data import

Requires adding an option that SQL errors are ignored.

commit | commitdiff | tree

Sarah Hoffmann [Thu, 13 May 2021 15:11:17 +0000 (17:11 +0200)]

move WorkerPool into db module

The pool is independent of the indexer and may also be used
by other parts of the software.

commit | commitdiff | tree

Sarah Hoffmann [Thu, 13 May 2021 15:00:29 +0000 (17:00 +0200)]

Merge pull request #2325 from lonvia/do-not-precompute-postcodes

Do not preload postcodes in the legacy tokenizer

commit | commitdiff | tree

Frederik Ramm [Thu, 6 May 2021 18:44:04 +0000 (20:44 +0200)]

Add array_key_last function for PHP <7.3

This patch adds an array_key_last function if it doesn't yet exist, fixes #2316. It is tested on PHP 7.2.24 but not PHP 7.3.

commit | commitdiff | tree

Sarah Hoffmann [Thu, 13 May 2021 14:14:12 +0000 (16:14 +0200)]

do not preload postcodes

This is too expensive for updates.

commit | commitdiff | tree

Sarah Hoffmann [Thu, 13 May 2021 12:52:19 +0000 (14:52 +0200)]

Merge pull request #2324 from lonvia/generic-external-postcodes

Rework postcode handling and generalised external postcode support

commit | commitdiff | tree

Sarah Hoffmann [Thu, 13 May 2021 12:31:41 +0000 (14:31 +0200)]

fix token_info migration

A bad indent meant that only one table received the new column.

commit | commitdiff | tree

Sarah Hoffmann [Thu, 13 May 2021 10:19:20 +0000 (12:19 +0200)]

ignore invalid coordinates in external postcodes

commit | commitdiff | tree

Sarah Hoffmann [Thu, 13 May 2021 10:07:20 +0000 (12:07 +0200)]

ignore entries without country code

commit | commitdiff | tree

Sarah Hoffmann [Thu, 13 May 2021 10:04:47 +0000 (12:04 +0200)]

add documentation for external postcode feature

commit | commitdiff | tree

Sarah Hoffmann [Thu, 13 May 2021 07:59:34 +0000 (09:59 +0200)]

correctly handle removing all postcodes for country

commit | commitdiff | tree

Sarah Hoffmann [Wed, 12 May 2021 22:14:52 +0000 (00:14 +0200)]

index postcodes after refreshing

commit | commitdiff | tree

Sarah Hoffmann [Wed, 12 May 2021 21:30:45 +0000 (23:30 +0200)]

add and extend tests for new postcode handling

commit | commitdiff | tree

Sarah Hoffmann [Wed, 12 May 2021 17:57:48 +0000 (19:57 +0200)]

move filling of postcode table to python

The Python code now takes care of reading postcodes from placex,
enhancing them with potentially existing external postcodes and
updating location_postcodes accordingly. The initial setup and
updates use exactly the same function.

External postcode handling has been generalized. External postcodes
for any country are now accepted. The format of the external postcode
file has changed. We now expect CSV, potentially gzipped. The
postcodes are no longer saved in the database.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 12 May 2021 18:25:22 +0000 (20:25 +0200)]

Merge pull request #2322 from mtmail/type-label-already-lowercased

typelabel value is already lowercased

commit | commitdiff | tree

marc tobias [Wed, 12 May 2021 17:16:51 +0000 (19:16 +0200)]

typelabel value is already lowercased

commit | commitdiff | tree

AntoJvlt [Mon, 10 May 2021 21:09:00 +0000 (23:09 +0200)]

Introduction of SPCsvLoader to load special phrases from a csv file

commit | commitdiff | tree

AntoJvlt [Mon, 10 May 2021 19:48:11 +0000 (21:48 +0200)]

Refactoring loading of external special phrases and importation process by introducing SPLoader and SPWikiLoader

commit | commitdiff | tree

Sarah Hoffmann [Thu, 6 May 2021 15:41:53 +0000 (17:41 +0200)]

Merge pull request #2314 from lonvia/fix-status-no-import-date

Correctly catch the exception when import date is missing

commit | commitdiff | tree

Sarah Hoffmann [Thu, 6 May 2021 15:22:04 +0000 (17:22 +0200)]

Merge pull request #2312 from lonvia/icu-tokenizer

Add new tokenizer based on libICU

commit | commitdiff | tree

Sarah Hoffmann [Thu, 6 May 2021 13:36:54 +0000 (15:36 +0200)]

correctly catch the exception when import date is missing

commit | commitdiff | tree

Sarah Hoffmann [Wed, 5 May 2021 19:16:55 +0000 (21:16 +0200)]

add missing transliterations

The ICU library only offers transliterations for a limited set of
script. Add transliterations for missing scripts from the PostgreSQL
module. These means that the same selection of scripts is supported
as with the old module.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 5 May 2021 15:09:38 +0000 (17:09 +0200)]

fix name of transliterator

Should be different from the normalisation rules.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 5 May 2021 08:00:34 +0000 (10:00 +0200)]

enable BDD tests for different tokenizers

The tokenizer to be used can be choosen with -DTOKENIZER.

Adapt all tests, so that they work with legacy_icu tokenizer.
Move lookup in word table to a function in the tokenizer.
Special phrases are temporarily imported from the wiki until
we have an implementation that can import from file. TIGER
tests do not work yet.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 4 May 2021 16:32:57 +0000 (18:32 +0200)]

add unit tests for legacy ICU tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Sun, 2 May 2021 20:13:18 +0000 (22:13 +0200)]

cache translieration results

commit | commitdiff | tree

Sarah Hoffmann [Sun, 2 May 2021 19:21:41 +0000 (21:21 +0200)]

add PHP part for new ICU-base tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Sun, 2 May 2021 15:52:45 +0000 (17:52 +0200)]

add Python part for new ICU-based tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Tue, 4 May 2021 10:45:26 +0000 (12:45 +0200)]

Merge pull request #2310 from RhinoDevel/master

2nd try: Add hint about replication update & recheck intervals being in seconds.

commit | commitdiff | tree

Marc [Tue, 4 May 2021 09:47:15 +0000 (11:47 +0200)]

Add hint about replication update & recheck intervals being in seconds.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 3 May 2021 07:15:34 +0000 (09:15 +0200)]

Merge pull request #2305 from lonvia/tokenizer

Factor out normalization into a separate module

commit | commitdiff | tree

Sarah Hoffmann [Sat, 1 May 2021 08:50:39 +0000 (10:50 +0200)]

mock tokenizer factory for replication tests

commit | commitdiff | tree

Sarah Hoffmann [Sat, 1 May 2021 08:28:49 +0000 (10:28 +0200)]

commit between migrations

Later migrations may require tables set up by older ones.

commit | commitdiff | tree

Sarah Hoffmann [Sat, 1 May 2021 08:03:00 +0000 (10:03 +0200)]

increase database version for tokenizer migration

commit | commitdiff | tree

Sarah Hoffmann [Fri, 30 Apr 2021 15:59:50 +0000 (17:59 +0200)]

fix liniting issues

commit | commitdiff | tree

Sarah Hoffmann [Fri, 30 Apr 2021 15:28:34 +0000 (17:28 +0200)]

move index creation for word table to tokenizer

This introduces a finalization routing for the tokenizer
where it can post-process the import if necessary.

commit | commitdiff | tree

Sarah Hoffmann [Fri, 30 Apr 2021 14:17:28 +0000 (16:17 +0200)]

indexer: fetch extra place data asynchronously

The indexer now fetches any extra data besides the place_id
asynchronously while processing the places from the last batch.
This also means that more places are now fetched at once.

commit | commitdiff | tree

Sarah Hoffmann [Thu, 29 Apr 2021 20:16:31 +0000 (22:16 +0200)]

fetch place info asynchronously

commit | commitdiff | tree

Sarah Hoffmann [Thu, 29 Apr 2021 19:57:43 +0000 (21:57 +0200)]

indexer: fetch ids in batches

commit | commitdiff | tree

Sarah Hoffmann [Wed, 28 Apr 2021 19:15:18 +0000 (21:15 +0200)]

move database check for module to tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Wed, 28 Apr 2021 18:13:51 +0000 (20:13 +0200)]

move status test to tokenizer

The availability of the module is now tested by the tokenizer.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 28 Apr 2021 15:39:03 +0000 (17:39 +0200)]

add more tests for legacy tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Wed, 28 Apr 2021 12:08:24 +0000 (14:08 +0200)]

move tokenization in query into tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Wed, 28 Apr 2021 08:59:07 +0000 (10:59 +0200)]

boilerplate for PHP code of tokenizer

This adds an installation step for PHP code for the tokenizer. The
PHP code is split in two parts. The updateable code is found in
lib-php. The tokenizer installs an additional script in the
project directory which then includes the code from lib-php and
defines all settings that are static to the database. The website
code then always includes the PHP from the project directory.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 28 Apr 2021 07:14:32 +0000 (09:14 +0200)]

tests for legacy tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Tue, 27 Apr 2021 19:50:35 +0000 (21:50 +0200)]

move amenity creation to tokenizer

The BDD tests still use the old-style amenity creation scripts
because we don't have simple means to import a hand-crafted
test file of special phrases right now.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 27 Apr 2021 09:37:18 +0000 (11:37 +0200)]

move default country name creation to tokenizer

The new function is also used, when a country us updated. All SQL
function related to country names have been removed.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 26 Apr 2021 15:30:10 +0000 (17:30 +0200)]

cache all postcodes

commit | commitdiff | tree

Sarah Hoffmann [Mon, 26 Apr 2021 14:50:28 +0000 (16:50 +0200)]

reorganise address iteration in tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Sun, 25 Apr 2021 21:43:57 +0000 (23:43 +0200)]

remove debug code

commit | commitdiff | tree

Sarah Hoffmann [Sun, 25 Apr 2021 21:42:56 +0000 (23:42 +0200)]

use address tokens in SQL

commit | commitdiff | tree

Sarah Hoffmann [Sun, 25 Apr 2021 20:04:07 +0000 (22:04 +0200)]

extract address tokens in tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Sun, 25 Apr 2021 16:26:36 +0000 (18:26 +0200)]

move postcode normalization into tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Sun, 25 Apr 2021 09:47:29 +0000 (11:47 +0200)]

move houseunumber handling to tokenizer

Normalization and token computation are now done in the tokenizer.
The tokenizer keeps a cache to the hundred most used house numbers
to keep the numbers of calls to the database low.

Open Source search based on OpenStreetMap data

RSS Atom