git.openstreetmap.org Git - nominatim.git/log

]> git.openstreetmap.org Git - nominatim.git/log

Sarah Hoffmann [Thu, 2 Sep 2021 16:13:45 +0000 (18:13 +0200)]

reduce penalty for special searches by name

Additional penalty for special terms with operator None
should only go to near searches. To reduce the number
of produced searches, restrict the none operator to
appear only in conjunction with the name.

commit | commitdiff | tree

Sarah Hoffmann [Thu, 2 Sep 2021 16:11:49 +0000 (18:11 +0200)]

further increase penalty on housenumbers without numbers

Make the penality dependent on the length of the token:
no penalty for one letter house numbers and increasing one
for more letters.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 31 Aug 2021 13:29:26 +0000 (15:29 +0200)]

Merge pull request #2435 from lonvia/simplified-to-traditional-chinese

icu: normalise simplified to traditional chinese

commit | commitdiff | tree

Sarah Hoffmann [Tue, 31 Aug 2021 09:18:34 +0000 (11:18 +0200)]

icu: normalise simplified to traditional chinese

The conversion is unambigious in most cases, so that the
information loss is minimal.

commit | commitdiff | tree

Sarah Hoffmann [Sun, 29 Aug 2021 08:11:59 +0000 (10:11 +0200)]

Merge pull request #2434 from lonvia/vagrant-scripts-in-actions

Test installation instructions via CI

commit | commitdiff | tree

Sarah Hoffmann [Mon, 23 Aug 2021 22:31:20 +0000 (00:31 +0200)]

CI: use packaged source also for test runs

commit | commitdiff | tree

Sarah Hoffmann [Mon, 23 Aug 2021 15:41:13 +0000 (17:41 +0200)]

CI: unify jobs for different vagrant scripts

commit | commitdiff | tree

Sarah Hoffmann [Sun, 22 Aug 2021 16:42:20 +0000 (18:42 +0200)]

add workflow for centos 8

commit | commitdiff | tree

Sarah Hoffmann [Sat, 21 Aug 2021 08:45:22 +0000 (10:45 +0200)]

CI: use vagrant scripts for import tests

Use vanilla docker images of Ubuntu and leave the setup
to the vagrant scripts. Then do the usual import tests.

Also fixes a couple of issues found with the scripts

commit | commitdiff | tree

Sarah Hoffmann [Sun, 22 Aug 2021 07:32:31 +0000 (09:32 +0200)]

Merge pull request #2432 from Mastercuber/patch-1

Added postcode

commit | commitdiff | tree

Mastercuber [Sun, 22 Aug 2021 00:52:41 +0000 (02:52 +0200)]

Added postcode

Added postcode to the list of addressdetails

commit | commitdiff | tree

Sarah Hoffmann [Sat, 21 Aug 2021 18:36:16 +0000 (20:36 +0200)]

Add link to fixthemap to issue template

commit | commitdiff | tree

Sarah Hoffmann [Sat, 21 Aug 2021 08:21:39 +0000 (10:21 +0200)]

Merge pull request #2429 from lonvia/place-name-to-admin-boundary

Indexing: move linking of places to the preparation stage

commit | commitdiff | tree

Sarah Hoffmann [Fri, 20 Aug 2021 19:53:13 +0000 (21:53 +0200)]

move linking of places to the preparation stage

Linked places may bring in extra names. These names need to be
processed by the tokenizer. That means that the linking needs
to be done before the data is handed to the tokenizer. Move finding
the linked place into the preparation stage and update the name
fields. Everything else is still done in the indexing stage.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 18 Aug 2021 13:02:19 +0000 (15:02 +0200)]

Merge pull request #2428 from lonvia/rename-icu-tokenizer

Rename legacy_icu tokenizer to icu tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Wed, 18 Aug 2021 07:08:20 +0000 (09:08 +0200)]

adapt CI workflow to new tokenizer name

commit | commitdiff | tree

Sarah Hoffmann [Tue, 17 Aug 2021 21:11:47 +0000 (23:11 +0200)]

rename legacy_icu tokenizer to icu tokenizer

The new icu tokenizer is now no longer compatible with the old
legacy tokenizer in terms of data structures. Therefore there
is also no longer a need to refer to the legacy tokenizer in the
name.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 17 Aug 2021 19:55:32 +0000 (21:55 +0200)]

Merge pull request #2427 from lonvia/remove-us-states-special-casing

Move US state hack into legacy tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Tue, 17 Aug 2021 12:28:55 +0000 (14:28 +0200)]

move special hack for US states to legacy tokenizer

The hack for IL, AL and LA is only needed because these abbreviations
are removed by the legacy tokenizer as a stop word. There is no need
to keep the hack for future tokenizers. Move it therefore to the
token extraction function.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 17 Aug 2021 08:49:07 +0000 (10:49 +0200)]

add tests for US state hacks

IL, AS and LA are replaced with the US state in Geocode because
the old tokenizer would simply remove the abbreviations otherwise.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 17 Aug 2021 07:38:03 +0000 (09:38 +0200)]

Merge pull request #2425 from lonvia/tokenizer-documentation

Introduce official Tokenizer API

commit | commitdiff | tree

Sarah Hoffmann [Mon, 16 Aug 2021 09:48:25 +0000 (11:48 +0200)]

add mkdocstrings requirement for building docs

mkdocstrings also needs access to the Python sources, so set
a PYTHONPATH accordingly. This makes running mkdocs directly
a bit awkward, therefore add a `make serve-doc` target.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 16 Aug 2021 07:57:01 +0000 (09:57 +0200)]

docs: extend explanation of query phrase

commit | commitdiff | tree

Sarah Hoffmann [Thu, 12 Aug 2021 09:21:50 +0000 (11:21 +0200)]

add documentation for PHP part of tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Thu, 12 Aug 2021 09:09:46 +0000 (11:09 +0200)]

php: make word list a first-class object

This separates the logic of creating word sets from the Phrase
class. A tokenizer may now derived the word sets any way they
like. The SimpleWordList class provides a standard implementation
for splitting phrases on spaces.

commit | commitdiff | tree

Sarah Hoffmann [Thu, 29 Jul 2021 19:25:59 +0000 (21:25 +0200)]

remove country restriction from tokenizer

Restricting tokens due to the search context is better done in
the generic search part instead of repeating the same test in
every tokenizer implementation.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 10 Aug 2021 15:31:04 +0000 (17:31 +0200)]

document tokenizer SQL interface

commit | commitdiff | tree

Sarah Hoffmann [Tue, 10 Aug 2021 12:51:35 +0000 (14:51 +0200)]

define formal public Python interface for tokenizer

This introduces an abstract class for the Tokenizer/Analyzer
for documentation purposes.

commit | commitdiff | tree

Sarah Hoffmann [Sat, 31 Jul 2021 07:49:29 +0000 (09:49 +0200)]

docs: querying and tokenizers

commit | commitdiff | tree

Sarah Hoffmann [Thu, 29 Jul 2021 18:54:33 +0000 (20:54 +0200)]

docs: add developer doc page for Tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Mon, 16 Aug 2021 06:48:28 +0000 (08:48 +0200)]

Merge pull request #2424 from lonvia/multi-country-import

Update instructions for importing multiple regions

commit | commitdiff | tree

Sarah Hoffmann [Sun, 15 Aug 2021 20:00:50 +0000 (22:00 +0200)]

Merge pull request #2423 from hummeltech/patch-1

Fix old paths for `phpcs` when using `make test`

commit | commitdiff | tree

Sarah Hoffmann [Sun, 15 Aug 2021 15:49:22 +0000 (17:49 +0200)]

ignore words without id for status

commit | commitdiff | tree

Sarah Hoffmann [Sun, 15 Aug 2021 10:24:13 +0000 (12:24 +0200)]

split up large setup function

commit | commitdiff | tree

Sarah Hoffmann [Sat, 14 Aug 2021 21:48:06 +0000 (23:48 +0200)]

port multi-region update scripts to nominatim tool

Also updates the documentation. For the simple case of just
importing multiple regions, provide simplified instructions
that use the new multi-file import feature.

Fixes #2365.

commit | commitdiff | tree

Sarah Hoffmann [Sat, 14 Aug 2021 20:46:35 +0000 (22:46 +0200)]

update osm2pgsql to 1.5.1

commit | commitdiff | tree

Sarah Hoffmann [Sat, 14 Aug 2021 19:42:21 +0000 (21:42 +0200)]

allow multiple files for the import command

The files are forwarded to osm2pgsql which is now able to merge
them correctly.

commit | commitdiff | tree

David Hummel [Thu, 12 Aug 2021 20:34:18 +0000 (13:34 -0700)]

Fix old paths for `phpcs` when using `make test`

These paths no longer exist since db3ced17bbfff00411f506d8c84419c875959d5e, they are now all located under `lib-php`

commit | commitdiff | tree

Sarah Hoffmann [Sun, 8 Aug 2021 09:09:36 +0000 (11:09 +0200)]

Merge pull request #2413 from osm-search/helm-chart

Installation docs - link to Kubernetes install project

commit | commitdiff | tree

mtmail [Tue, 3 Aug 2021 10:02:35 +0000 (12:02 +0200)]

Installation docs - link to Kubernetes install project

As reported by @robjuz in https://github.com/osm-search/Nominatim/discussions/2412

commit | commitdiff | tree

Sarah Hoffmann [Wed, 28 Jul 2021 12:28:49 +0000 (14:28 +0200)]

Merge pull request #2408 from lonvia/icu-change-word-table-layout

Change table layout of word table for ICU tokenizer

commit | commitdiff | tree

Sarah Hoffmann [Sun, 25 Jul 2021 14:29:04 +0000 (16:29 +0200)]

php: force use of global Exception class

commit | commitdiff | tree

Sarah Hoffmann [Sun, 25 Jul 2021 13:30:47 +0000 (15:30 +0200)]

fix Python linitin errors

commit | commitdiff | tree

Sarah Hoffmann [Sun, 25 Jul 2021 13:13:49 +0000 (15:13 +0200)]

fix linitin issues in PHP

commit | commitdiff | tree

Sarah Hoffmann [Sun, 25 Jul 2021 13:08:11 +0000 (15:08 +0200)]

reinstate word column in icu word table

Postgresql is very bad at creating statistics for jsonb
columns. The result is that the query planer tends to
use JIT for queries with a where over 'info' even when
there is an index.

commit | commitdiff | tree

Sarah Hoffmann [Sat, 24 Jul 2021 10:12:31 +0000 (12:12 +0200)]

bdd tests: do not query word table directly

The BDD tests cannot make assumptions about the structure of the
word table anymore because it depends on the tokenizer. Use more
abstract descriptions instead that ask for specific kinds of
tokens.

commit | commitdiff | tree

Sarah Hoffmann [Thu, 22 Jul 2021 15:24:43 +0000 (17:24 +0200)]

adapt unit test for new word table

Requires a second wrapper class for the word table with the new
layout. This class is interface-compatible, so that later when
the ICU tokenizer becomes the default, all tests that depend on
behaviour of the default tokenizer can be switched to the other
wrapper.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 21 Jul 2021 09:37:14 +0000 (11:37 +0200)]

convert word info column to json before copying

commit | commitdiff | tree

Sarah Hoffmann [Wed, 21 Jul 2021 08:52:34 +0000 (10:52 +0200)]

adapt special terms lookup to new word table

commit | commitdiff | tree

Sarah Hoffmann [Wed, 21 Jul 2021 08:41:38 +0000 (10:41 +0200)]

switch word tokens to new word table layout

commit | commitdiff | tree

Sarah Hoffmann [Tue, 20 Jul 2021 19:11:01 +0000 (21:11 +0200)]

switch special phrases to new word table format

commit | commitdiff | tree

Sarah Hoffmann [Tue, 20 Jul 2021 10:11:12 +0000 (12:11 +0200)]

switch postcode tokens to new word table layout

commit | commitdiff | tree

Sarah Hoffmann [Tue, 20 Jul 2021 09:36:20 +0000 (11:36 +0200)]

switch housenumber tokens to new word table layout

commit | commitdiff | tree

Sarah Hoffmann [Tue, 20 Jul 2021 09:21:13 +0000 (11:21 +0200)]

switch country name tokens to new word table layout

commit | commitdiff | tree

Sarah Hoffmann [Tue, 20 Jul 2021 08:27:06 +0000 (10:27 +0200)]

new word table layout for icu tokenizer

The table now directly reflects the different token types.
Extra information is saved in a json structure that may be
dynamically extended in the future without affecting the
table layout.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 28 Jul 2021 09:28:49 +0000 (11:28 +0200)]

fix typos in tokenizer docs

commit | commitdiff | tree

Sarah Hoffmann [Mon, 26 Jul 2021 10:38:56 +0000 (12:38 +0200)]

Merge pull request #2401 from lonvia/port-add-data-to-python

Port add-data functions from PHP to Python

commit | commitdiff | tree

Sarah Hoffmann [Sun, 25 Jul 2021 21:44:22 +0000 (23:44 +0200)]

adapt cli tests to Python port for add-data

commit | commitdiff | tree

Sarah Hoffmann [Sun, 25 Jul 2021 21:30:46 +0000 (23:30 +0200)]

remove unused update script

commit | commitdiff | tree

Sarah Hoffmann [Sun, 25 Jul 2021 21:29:15 +0000 (23:29 +0200)]

replace add-data function with native Python code

commit | commitdiff | tree

Sarah Hoffmann [Sun, 25 Jul 2021 16:14:12 +0000 (18:14 +0200)]

move add-data subcommand into a separate file

commit | commitdiff | tree

Sarah Hoffmann [Tue, 20 Jul 2021 08:08:31 +0000 (10:08 +0200)]

fix parameters for TokenWord creation

commit | commitdiff | tree

Sarah Hoffmann [Mon, 19 Jul 2021 12:28:02 +0000 (14:28 +0200)]

Merge pull request #2397 from lonvia/increase-minimum-required-versions

Increase minimum required PostgreSQL version to 9.5

commit | commitdiff | tree

Sarah Hoffmann [Mon, 19 Jul 2021 08:24:57 +0000 (10:24 +0200)]

remove special code for pre9.5 postgresql

9.5 is now the minimum requirement.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 19 Jul 2021 08:15:32 +0000 (10:15 +0200)]

increase minimum version for PostgreSQL to 9.5

This is the minimum version we can test with the CI.
With 9.5 there is also complete support for jsonb available.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 19 Jul 2021 08:14:14 +0000 (10:14 +0200)]

require Python 3.6 also in CMakeFile

This had been forgotten when increasing the minimum Python version.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 19 Jul 2021 07:42:37 +0000 (09:42 +0200)]

Merge pull request #2396 from lonvia/partial-word-token

Reorganise code that build the SearchDescription

commit | commitdiff | tree

Sarah Hoffmann [Sun, 18 Jul 2021 18:20:22 +0000 (20:20 +0200)]

make all Token menbers private

commit | commitdiff | tree

Sarah Hoffmann [Sun, 18 Jul 2021 14:52:37 +0000 (16:52 +0200)]

merge marking rare name with adding name token

Only name tokens can be rare, so this should be the same
function.

commit | commitdiff | tree

Sarah Hoffmann [Sun, 18 Jul 2021 14:10:42 +0000 (16:10 +0200)]

add documentation for public interface of SearchDescription

commit | commitdiff | tree

Sarah Hoffmann [Sat, 17 Jul 2021 20:01:35 +0000 (22:01 +0200)]

factor out check if a token fits current search

Saves allocating an empty array.

commit | commitdiff | tree

Sarah Hoffmann [Sat, 17 Jul 2021 18:24:33 +0000 (20:24 +0200)]

move SearchDescription building into tokens

Moving the logic for extending the SearchDescription into the
token classes splits up the code and makes it more readable.
More importantly: it allows tokenizer to define custom token
classes in the future.

commit | commitdiff | tree

Sarah Hoffmann [Thu, 15 Jul 2021 12:48:20 +0000 (14:48 +0200)]

remove Token from explicit input for SearchDescription extension

The token string is only required by the PartialToken type, so
it can simply save the token string internally. No need to pass
it to every type.

Also moves the check for multi-word partials to the token loader
code in the tokenizer. Multi-word partials can only happen with
the legacy tokenizer and when the database was loaded with an
older version of Nominatim. No need to keep the check for
everybody.

commit | commitdiff | tree

Sarah Hoffmann [Thu, 15 Jul 2021 12:12:59 +0000 (14:12 +0200)]

factor out query position

Moves token and phrase position and phrase type into a separate
class that is handed in when assembling the search description.
This drastically reduces the number of parameters for the function
to extend the search descriptions and gives us more flexibility
in the future for more complex positional analysis.

commit | commitdiff | tree

Sarah Hoffmann [Wed, 14 Jul 2021 20:17:17 +0000 (22:17 +0200)]

remove special status of partial tokens

Full-word tokens are no longer marked by a space at the
beginning of the token. Use the new Partial token category
instead. This removes a couple of special casing, we don't
really need.

The word table still has the space for compatibility reasons,
so the tokenizer code needs to get rid of it when loading the
tokens.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 13 Jul 2021 14:54:51 +0000 (16:54 +0200)]

introduce a separate token type for partials

This means that the leading space can be removed as a partial
word indicator.

commit | commitdiff | tree

Sarah Hoffmann [Tue, 13 Jul 2021 14:46:12 +0000 (16:46 +0200)]

Merge pull request #2393 from lonvia/fix-flake8-issues

Fix flake8 issues

commit | commitdiff | tree

Sarah Hoffmann [Mon, 12 Jul 2021 20:05:22 +0000 (22:05 +0200)]

use psycopg's SQL quoting where possible

Use the SQL formatting supplied with psycopg whenever the
query needs to be put together from snippets.

commit | commitdiff | tree

Sarah Hoffmann [Mon, 12 Jul 2021 19:08:20 +0000 (21:08 +0200)]

add helper function for execute_values

Make psycopg2's convenience function accessible through
the cursor.