]>
git.openstreetmap.org Git - nominatim.git/log
Sarah Hoffmann [Thu, 10 Jun 2021 15:18:23 +0000 (17:18 +0200)]
complete tests for icu tokenizer
Sarah Hoffmann [Thu, 10 Jun 2021 08:28:46 +0000 (10:28 +0200)]
fix full term token in special phrases
Sarah Hoffmann [Thu, 10 Jun 2021 08:06:49 +0000 (10:06 +0200)]
complete tests for rule loader
Sarah Hoffmann [Thu, 10 Jun 2021 07:36:43 +0000 (09:36 +0200)]
correctly quote strings when copying in data
Encapsulate the copy string in a class that ensures that
copy lines are written with correct quoting.
Sarah Hoffmann [Wed, 9 Jun 2021 13:07:36 +0000 (15:07 +0200)]
update unit tests for adapted abbreviation code
Sarah Hoffmann [Wed, 9 Jun 2021 08:53:39 +0000 (10:53 +0200)]
add abbreviations from legacy tokenizer
These abbreviations are not a perfect fit anymore because
abbreviation replacement is now applied before transliteration.
Sarah Hoffmann [Sun, 6 Jun 2021 09:00:44 +0000 (11:00 +0200)]
adapt tests for ICU tokenizer
Sarah Hoffmann [Fri, 28 May 2021 20:06:13 +0000 (22:06 +0200)]
move abbreviation computation into import phase
This adds precomputation of abbreviated terms for names and removes
abbreviation of terms in the query. Basic import works but still
needs some thorough testing as well as speed improvements during
import.
New dependency for python library datrie.
Sarah Hoffmann [Wed, 26 May 2021 18:50:34 +0000 (20:50 +0200)]
icu tokenizer: move transliteration rules in separate file
The tokenizer configuration has become difficult to handle
due to the additional manual transliteration rules. Allow
to have a separate rule file that is given to the ICU library
as is.
Sarah Hoffmann [Sat, 3 Jul 2021 19:14:43 +0000 (21:14 +0200)]
docs: nominatim-ui should be installed from the release
The development version does not provide the pre-packaged
dist directory anymore.
Sarah Hoffmann [Sat, 26 Jun 2021 14:21:08 +0000 (16:21 +0200)]
Merge pull request #2373 from lonvia/tweak-search-cost
Further tweaking of search cost
Sarah Hoffmann [Sat, 26 Jun 2021 09:20:25 +0000 (11:20 +0200)]
remove penalty for full words in address
Now that mutli-word partials no longer exist, multi-word full
words need to be used to search in addresses and therefore no
longer should have a penalty.
Also changes the condition when a full word is included into
the address. It is no longer relevant if an equivalent partial
exists but only if the term consists of more than one word.
Sarah Hoffmann [Sat, 26 Jun 2021 08:31:55 +0000 (10:31 +0200)]
adjust penalty for housenumber-in-name searches
When searching for house numbers in the name (for place-only
terms) then the same penalties need to apply as for the
regular house number search.
Change the code to first compute the penalties and then create
the new search variants.
Sarah Hoffmann [Fri, 18 Jun 2021 08:58:41 +0000 (10:58 +0200)]
make sure old data gets deleted on place type change
When changing from some other place type to place=postcode
make sure that the old place type entry in the place table
is deleted.
Sarah Hoffmann [Thu, 17 Jun 2021 22:28:10 +0000 (00:28 +0200)]
update postcode in place if it already exists
Sarah Hoffmann [Thu, 17 Jun 2021 13:30:05 +0000 (15:30 +0200)]
Merge pull request #2369 from lonvia/exclude-poi-from-housenumber-search
Do not return POIs when dropping house number in query
Sarah Hoffmann [Thu, 17 Jun 2021 10:05:33 +0000 (12:05 +0200)]
do not return POIs when dropping house number in query
We've previously added searching through rank 30 in a house
number search to enable searches for house number+name.
This had the unintended side effect that rank 30 objects
are also returned in s search that dropped the house number
from the query. This is wrong because POIs cannot function
as a parent to a house number.
This fix drops all rank 30 objects from the results for a
house number search if they do not match the requested house
number.
Sarah Hoffmann [Wed, 16 Jun 2021 09:45:07 +0000 (11:45 +0200)]
Merge pull request #2360 from AntoJvlt/postcodes-place-table
Use place instead of placex to compute postcodes
AntoJvlt [Sat, 12 Jun 2021 13:46:08 +0000 (15:46 +0200)]
Improved performance of the postcodes query and some code cleaning
AntoJvlt [Sat, 12 Jun 2021 13:35:51 +0000 (15:35 +0200)]
Always delete old placex entry for type=postcode when inserting a new one into the place table
AntoJvlt [Wed, 9 Jun 2021 07:24:25 +0000 (09:24 +0200)]
Handle postcode type change in place insert trigger
AntoJvlt [Tue, 8 Jun 2021 20:39:04 +0000 (22:39 +0200)]
Clean and update tests for postcodes
AntoJvlt [Tue, 8 Jun 2021 07:33:10 +0000 (09:33 +0200)]
Use place_exists() into can_compute() for postcodes
AntoJvlt [Mon, 7 Jun 2021 13:02:53 +0000 (15:02 +0200)]
Update tests for postcodes
AntoJvlt [Fri, 4 Jun 2021 19:26:13 +0000 (21:26 +0200)]
Use place instead of placex to compute postcodes
Sarah Hoffmann [Tue, 8 Jun 2021 08:42:14 +0000 (10:42 +0200)]
do not fail CI on codecov errors
The CodeCove upload depends on unreliable external code.
Sarah Hoffmann [Sun, 6 Jun 2021 16:29:51 +0000 (18:29 +0200)]
Merge pull request #2359 from lonvia/switch-bdd-tests-to-api-search
Remove deprecated commandline query function
Sarah Hoffmann [Sun, 6 Jun 2021 13:28:21 +0000 (15:28 +0200)]
remove deprecated query interface
Searches can now be done via the thin API wrapper.
Sarah Hoffmann [Sun, 6 Jun 2021 13:27:52 +0000 (15:27 +0200)]
switch BDD tests to always use search API
Sarah Hoffmann [Fri, 4 Jun 2021 21:54:37 +0000 (23:54 +0200)]
Merge pull request #2358 from AntoJvlt/documentation-update
Update documentation
AntoJvlt [Tue, 1 Jun 2021 15:02:45 +0000 (17:02 +0200)]
Update documentation
Sarah Hoffmann [Wed, 2 Jun 2021 18:58:14 +0000 (20:58 +0200)]
Merge pull request #2357 from lonvia/legacy-tokenizer-fix-word-entries
Fix insertion of special terms and countries into word table
Sarah Hoffmann [Wed, 2 Jun 2021 15:37:27 +0000 (17:37 +0200)]
fix insertion of special terms and countries into word table
Special terms need to be prefixed by a space because they are
full terms.
For countries avoid duplicate entries of word tokens.
Adds tests for adding country terms.
Sarah Hoffmann [Wed, 2 Jun 2021 14:25:26 +0000 (16:25 +0200)]
Merge pull request #2356 from lonvia/freeze-after-import
Call freeze after running and non-updateable import
Sarah Hoffmann [Wed, 2 Jun 2021 14:11:29 +0000 (16:11 +0200)]
docs: reload SQL when migrating to 3.6
SQL functions must always be reloaded when updating the software.
All other updates included the instruction as part of some other
migration. From 3.7 on it will happen as part of the migration
command.
Fixes #2335.
Sarah Hoffmann [Wed, 2 Jun 2021 09:08:48 +0000 (11:08 +0200)]
call freeze after running and non-updateable import
Some of the tables will have already been removed but
the tables for indexing are still there and should be
dropped.
Sarah Hoffmann [Wed, 26 May 2021 09:47:08 +0000 (11:47 +0200)]
commit changes to replication log table
Fixes #2350.
Sarah Hoffmann [Wed, 26 May 2021 09:04:02 +0000 (11:04 +0200)]
always compute guessed postcode for POIs from centroid
When guessing postcodes from the area, only postcodes within
that area are accepted. For POIs that is usually not what we
want as the postcode would have to be within a house for
example.
Fixes #2301.
Sarah Hoffmann [Tue, 25 May 2021 18:43:44 +0000 (20:43 +0200)]
Merge pull request #2349 from lonvia/fix-website-refresh
Only initialise tokenizer for refresh functions where needed
Sarah Hoffmann [Tue, 25 May 2021 17:16:22 +0000 (19:16 +0200)]
only initialise tokenizer for refresh functions where needed
Fixes #2347.
Sarah Hoffmann [Mon, 24 May 2021 15:41:38 +0000 (17:41 +0200)]
Merge pull request #2346 from lonvia/words-vs-tokens
Cleanup use of partial words in legacy tokenizers
Sarah Hoffmann [Mon, 24 May 2021 08:29:21 +0000 (10:29 +0200)]
add tests for new full name computation with ICU
Sarah Hoffmann [Sun, 23 May 2021 21:58:58 +0000 (23:58 +0200)]
reorganize keyword creation for legacy tokenizer
- only save partial words without internal spaces
- consider comma and semicolon a separator of full words
- consider parts before an opening bracket a full word
(but not the part after the bracket)
Fixes #244.
Sarah Hoffmann [Sun, 23 May 2021 21:08:11 +0000 (23:08 +0200)]
use make_keywords for place search terms also
Ensures that place indeed uses the same search names as other
names.
Sarah Hoffmann [Sun, 23 May 2021 20:13:03 +0000 (22:13 +0200)]
always ignore multi term partials in search
Partial terms should only ever consist of one word. Ignore
any other, they are a leftover from inefficient word index
builts.
Sarah Hoffmann [Sat, 22 May 2021 08:36:35 +0000 (10:36 +0200)]
Merge pull request #2342 from lonvia/icu-tokenizer-ci
Add BDD tests with icu tokenizer to CI runs
Sarah Hoffmann [Fri, 21 May 2021 20:40:22 +0000 (22:40 +0200)]
CI: run BDD tests with legacy_icu tokenizer
Sarah Hoffmann [Fri, 21 May 2021 20:39:56 +0000 (22:39 +0200)]
enable Tiger BDD API test for legacy_icu
Sarah Hoffmann [Thu, 20 May 2021 15:30:30 +0000 (17:30 +0200)]
Merge pull request #2341 from lonvia/cleanup-python-tests
Cleanup and linting of python tests
Sarah Hoffmann [Thu, 20 May 2021 08:26:23 +0000 (10:26 +0200)]
Merge pull request #2337 from mogita/fix/invalid-query-string
fix: add the missing question mark
Sarah Hoffmann [Wed, 19 May 2021 21:07:39 +0000 (23:07 +0200)]
test: fix linting errors
Sarah Hoffmann [Wed, 19 May 2021 15:37:03 +0000 (17:37 +0200)]
test: more use of table_factory
Sarah Hoffmann [Wed, 19 May 2021 14:42:35 +0000 (16:42 +0200)]
test: avoid use of tempfile module
Use the tmp_path fixture instead which provides automatic
cleanup.
Sarah Hoffmann [Wed, 19 May 2021 14:03:54 +0000 (16:03 +0200)]
test: use src_dir fixture instead of self-computed paths
Sarah Hoffmann [Wed, 19 May 2021 10:11:04 +0000 (12:11 +0200)]
test: replace raw execute() with fixture code where possible
Sarah Hoffmann [Wed, 19 May 2021 08:51:10 +0000 (10:51 +0200)]
test: use table_rows() and execute_values() where possible
Some uses of scalar() could also be replaced with convenience
functions from the word table mock.
Sarah Hoffmann [Wed, 19 May 2021 08:30:36 +0000 (10:30 +0200)]
test: move Testingcursor into separate class
Also adds more convenience functions: counting with a where
statement and a wrapper to execute_values().
mogita [Wed, 19 May 2021 05:35:15 +0000 (13:35 +0800)]
fix: add the missing question mark
Sarah Hoffmann [Tue, 18 May 2021 21:00:10 +0000 (23:00 +0200)]
Merge pull request #2336 from lonvia/do-not-mask-error-when-loading-tokenizer
Do not hide errors when importing tokenizer
Sarah Hoffmann [Tue, 18 May 2021 20:58:25 +0000 (22:58 +0200)]
Merge pull request #2321 from AntoJvlt/csv-import-special-phrases
CSV import for special phrases and loader refactoring
AntoJvlt [Mon, 17 May 2021 21:00:22 +0000 (23:00 +0200)]
Documentation update and small code fixes
Sarah Hoffmann [Tue, 18 May 2021 14:28:21 +0000 (16:28 +0200)]
do not hide errors when importing tokenizer
Explicitly check for the tokenizer source file to check that
the name is correct. We can't use the import error for that
because it hides other import errors like a missing
library.
Fixes #2327.
Sarah Hoffmann [Tue, 18 May 2021 09:30:58 +0000 (11:30 +0200)]
Merge pull request #2332 from lonvia/fix-keyword-details
Always use object type for details keywords
Sarah Hoffmann [Mon, 17 May 2021 14:36:32 +0000 (16:36 +0200)]
always use object type for details keywords
When name and address is empty, the keywords field in the response
of the details API would be an array because that is what PHP's
json_encode defaults to with empty array(). This default can only
be changed globally per json_encode call and that might cause
unintended colleteral damage. Work around the issue by making
name and address an empty array instead of keywords.
Fixes #2329.
AntoJvlt [Mon, 17 May 2021 11:52:35 +0000 (13:52 +0200)]
Resolve conflicts
AntoJvlt [Mon, 17 May 2021 10:53:58 +0000 (12:53 +0200)]
Special phrases documentation updated
AntoJvlt [Mon, 17 May 2021 10:40:50 +0000 (12:40 +0200)]
Added --no-replace command for special phrases importation and added corresponding tests
AntoJvlt [Sun, 16 May 2021 14:59:12 +0000 (16:59 +0200)]
Code cleaning and SPLoader deleted
AntoJvlt [Sun, 16 May 2021 13:32:22 +0000 (15:32 +0200)]
Add tests for the new SPWikiLoader and SPCsvLoader
Sarah Hoffmann [Fri, 14 May 2021 08:40:22 +0000 (10:40 +0200)]
Merge pull request #2323 from darkshredder/disable-search-reverse-only
Feat: Disabled search API for --reverse-only imports
Sarah Hoffmann [Fri, 14 May 2021 07:58:50 +0000 (09:58 +0200)]
Merge pull request #2328 from lonvia/convert-tiger-to-csv
Switch external Tiger data to CSV format
Sarah Hoffmann [Fri, 14 May 2021 07:44:10 +0000 (09:44 +0200)]
install default settings for legacy_icu tokenizer
Sarah Hoffmann [Thu, 13 May 2021 21:39:01 +0000 (23:39 +0200)]
adapt documentation to use Tiger CSV dump
Sarah Hoffmann [Thu, 13 May 2021 21:37:51 +0000 (23:37 +0200)]
adapt tests to new TIGER CSV format
Sarah Hoffmann [Thu, 13 May 2021 20:11:41 +0000 (22:11 +0200)]
use tokenizer during Tiger data import
This also changes the required import format to CSV.
Darkshredder [Wed, 12 May 2021 21:44:37 +0000 (03:14 +0530)]
feat: Added reverse-only-search validation
Sarah Hoffmann [Thu, 13 May 2021 20:09:56 +0000 (22:09 +0200)]
Merge pull request #2326 from lonvia/wokerpool-for-tiger-data
Use WorkerPool when importing Tiger data
Sarah Hoffmann [Thu, 13 May 2021 18:16:30 +0000 (20:16 +0200)]
use WorkerPool for Tiger data import
Requires adding an option that SQL errors are ignored.
Sarah Hoffmann [Thu, 13 May 2021 15:11:17 +0000 (17:11 +0200)]
move WorkerPool into db module
The pool is independent of the indexer and may also be used
by other parts of the software.
Sarah Hoffmann [Thu, 13 May 2021 15:00:29 +0000 (17:00 +0200)]
Merge pull request #2325 from lonvia/do-not-precompute-postcodes
Do not preload postcodes in the legacy tokenizer
Frederik Ramm [Thu, 6 May 2021 18:44:04 +0000 (20:44 +0200)]
Add array_key_last function for PHP <7.3
This patch adds an array_key_last function if it doesn't yet exist, fixes #2316. It is tested on PHP 7.2.24 but not PHP 7.3.
Sarah Hoffmann [Thu, 13 May 2021 14:14:12 +0000 (16:14 +0200)]
do not preload postcodes
This is too expensive for updates.
Sarah Hoffmann [Thu, 13 May 2021 12:52:19 +0000 (14:52 +0200)]
Merge pull request #2324 from lonvia/generic-external-postcodes
Rework postcode handling and generalised external postcode support
Sarah Hoffmann [Thu, 13 May 2021 12:31:41 +0000 (14:31 +0200)]
fix token_info migration
A bad indent meant that only one table received the new column.
Sarah Hoffmann [Thu, 13 May 2021 10:19:20 +0000 (12:19 +0200)]
ignore invalid coordinates in external postcodes
Sarah Hoffmann [Thu, 13 May 2021 10:07:20 +0000 (12:07 +0200)]
ignore entries without country code
Sarah Hoffmann [Thu, 13 May 2021 10:04:47 +0000 (12:04 +0200)]
add documentation for external postcode feature
Sarah Hoffmann [Thu, 13 May 2021 07:59:34 +0000 (09:59 +0200)]
correctly handle removing all postcodes for country
Sarah Hoffmann [Wed, 12 May 2021 22:14:52 +0000 (00:14 +0200)]
index postcodes after refreshing
Sarah Hoffmann [Wed, 12 May 2021 21:30:45 +0000 (23:30 +0200)]
add and extend tests for new postcode handling
Sarah Hoffmann [Wed, 12 May 2021 17:57:48 +0000 (19:57 +0200)]
move filling of postcode table to python
The Python code now takes care of reading postcodes from placex,
enhancing them with potentially existing external postcodes and
updating location_postcodes accordingly. The initial setup and
updates use exactly the same function.
External postcode handling has been generalized. External postcodes
for any country are now accepted. The format of the external postcode
file has changed. We now expect CSV, potentially gzipped. The
postcodes are no longer saved in the database.
Sarah Hoffmann [Wed, 12 May 2021 18:25:22 +0000 (20:25 +0200)]
Merge pull request #2322 from mtmail/type-label-already-lowercased
typelabel value is already lowercased
marc tobias [Wed, 12 May 2021 17:16:51 +0000 (19:16 +0200)]
typelabel value is already lowercased
AntoJvlt [Mon, 10 May 2021 21:09:00 +0000 (23:09 +0200)]
Introduction of SPCsvLoader to load special phrases from a csv file
AntoJvlt [Mon, 10 May 2021 19:48:11 +0000 (21:48 +0200)]
Refactoring loading of external special phrases and importation process by introducing SPLoader and SPWikiLoader
Sarah Hoffmann [Thu, 6 May 2021 15:41:53 +0000 (17:41 +0200)]
Merge pull request #2314 from lonvia/fix-status-no-import-date
Correctly catch the exception when import date is missing
Sarah Hoffmann [Thu, 6 May 2021 15:22:04 +0000 (17:22 +0200)]
Merge pull request #2312 from lonvia/icu-tokenizer
Add new tokenizer based on libICU
Sarah Hoffmann [Thu, 6 May 2021 13:36:54 +0000 (15:36 +0200)]
correctly catch the exception when import date is missing
Sarah Hoffmann [Wed, 5 May 2021 19:16:55 +0000 (21:16 +0200)]
add missing transliterations
The ICU library only offers transliterations for a limited set of
script. Add transliterations for missing scripts from the PostgreSQL
module. These means that the same selection of scripts is supported
as with the old module.
Sarah Hoffmann [Wed, 5 May 2021 15:09:38 +0000 (17:09 +0200)]
fix name of transliterator
Should be different from the normalisation rules.