]> git.openstreetmap.org Git - nominatim.git/log
nominatim.git
3 years agoMerge pull request #2322 from mtmail/type-label-already-lowercased
Sarah Hoffmann [Wed, 12 May 2021 18:25:22 +0000 (20:25 +0200)]
Merge pull request #2322 from mtmail/type-label-already-lowercased

typelabel value is already lowercased

3 years agotypelabel value is already lowercased
marc tobias [Wed, 12 May 2021 17:16:51 +0000 (19:16 +0200)]
typelabel value is already lowercased

3 years agoMerge pull request #2314 from lonvia/fix-status-no-import-date
Sarah Hoffmann [Thu, 6 May 2021 15:41:53 +0000 (17:41 +0200)]
Merge pull request #2314 from lonvia/fix-status-no-import-date

Correctly catch the exception when import date is missing

3 years agoMerge pull request #2312 from lonvia/icu-tokenizer
Sarah Hoffmann [Thu, 6 May 2021 15:22:04 +0000 (17:22 +0200)]
Merge pull request #2312 from lonvia/icu-tokenizer

Add new tokenizer based on libICU

3 years agocorrectly catch the exception when import date is missing
Sarah Hoffmann [Thu, 6 May 2021 13:36:54 +0000 (15:36 +0200)]
correctly catch the exception when import date is missing

3 years agoadd missing transliterations
Sarah Hoffmann [Wed, 5 May 2021 19:16:55 +0000 (21:16 +0200)]
add missing transliterations

The ICU library only offers transliterations for a limited set of
script. Add transliterations for missing scripts from the PostgreSQL
module. These means that the same selection of scripts is supported
as with the old module.

3 years agofix name of transliterator
Sarah Hoffmann [Wed, 5 May 2021 15:09:38 +0000 (17:09 +0200)]
fix name of transliterator

Should be different from the normalisation rules.

3 years agoenable BDD tests for different tokenizers
Sarah Hoffmann [Wed, 5 May 2021 08:00:34 +0000 (10:00 +0200)]
enable BDD tests for different tokenizers

The tokenizer to be used can be choosen with -DTOKENIZER.

Adapt all tests, so that they work with legacy_icu tokenizer.
Move lookup in word table to a function in the tokenizer.
Special phrases are temporarily imported from the wiki until
we have an implementation that can import from file. TIGER
tests do not work yet.

3 years agoadd unit tests for legacy ICU tokenizer
Sarah Hoffmann [Tue, 4 May 2021 16:32:57 +0000 (18:32 +0200)]
add unit tests for legacy ICU tokenizer

3 years agocache translieration results
Sarah Hoffmann [Sun, 2 May 2021 20:13:18 +0000 (22:13 +0200)]
cache translieration results

3 years agoadd PHP part for new ICU-base tokenizer
Sarah Hoffmann [Sun, 2 May 2021 19:21:41 +0000 (21:21 +0200)]
add PHP part for new ICU-base tokenizer

3 years agoadd Python part for new ICU-based tokenizer
Sarah Hoffmann [Sun, 2 May 2021 15:52:45 +0000 (17:52 +0200)]
add Python part for new ICU-based tokenizer

3 years agoMerge pull request #2310 from RhinoDevel/master
Sarah Hoffmann [Tue, 4 May 2021 10:45:26 +0000 (12:45 +0200)]
Merge pull request #2310 from RhinoDevel/master

2nd try: Add hint about replication update & recheck intervals being in seconds.

3 years agoAdd hint about replication update & recheck intervals being in seconds.
Marc [Tue, 4 May 2021 09:47:15 +0000 (11:47 +0200)]
Add hint about replication update & recheck intervals being in seconds.

3 years agoMerge pull request #2305 from lonvia/tokenizer
Sarah Hoffmann [Mon, 3 May 2021 07:15:34 +0000 (09:15 +0200)]
Merge pull request #2305 from lonvia/tokenizer

Factor out normalization into a separate module

3 years agomock tokenizer factory for replication tests
Sarah Hoffmann [Sat, 1 May 2021 08:50:39 +0000 (10:50 +0200)]
mock tokenizer factory for replication tests

3 years agocommit between migrations
Sarah Hoffmann [Sat, 1 May 2021 08:28:49 +0000 (10:28 +0200)]
commit between migrations

Later migrations may require tables set up by older ones.

3 years agoincrease database version for tokenizer migration
Sarah Hoffmann [Sat, 1 May 2021 08:03:00 +0000 (10:03 +0200)]
increase database version for tokenizer migration

3 years agofix liniting issues
Sarah Hoffmann [Fri, 30 Apr 2021 15:59:50 +0000 (17:59 +0200)]
fix liniting issues

3 years agomove index creation for word table to tokenizer
Sarah Hoffmann [Fri, 30 Apr 2021 15:28:34 +0000 (17:28 +0200)]
move index creation for word table to tokenizer

This introduces a finalization routing for the tokenizer
where it can post-process the import if necessary.

3 years agoindexer: fetch extra place data asynchronously
Sarah Hoffmann [Fri, 30 Apr 2021 14:17:28 +0000 (16:17 +0200)]
indexer: fetch extra place data asynchronously

The indexer now fetches any extra data besides the place_id
asynchronously while processing the places from the last batch.
This also means that more places are now fetched at once.

3 years agofetch place info asynchronously
Sarah Hoffmann [Thu, 29 Apr 2021 20:16:31 +0000 (22:16 +0200)]
fetch place info asynchronously

3 years agoindexer: fetch ids in batches
Sarah Hoffmann [Thu, 29 Apr 2021 19:57:43 +0000 (21:57 +0200)]
indexer: fetch ids in batches

3 years agomove database check for module to tokenizer
Sarah Hoffmann [Wed, 28 Apr 2021 19:15:18 +0000 (21:15 +0200)]
move database check for module to tokenizer

3 years agomove status test to tokenizer
Sarah Hoffmann [Wed, 28 Apr 2021 18:13:51 +0000 (20:13 +0200)]
move status test to tokenizer

The availability of the module is now tested by the tokenizer.

3 years agoadd more tests for legacy tokenizer
Sarah Hoffmann [Wed, 28 Apr 2021 15:39:03 +0000 (17:39 +0200)]
add more tests for legacy tokenizer

3 years agomove tokenization in query into tokenizer
Sarah Hoffmann [Wed, 28 Apr 2021 12:08:24 +0000 (14:08 +0200)]
move tokenization in query into tokenizer

3 years agoboilerplate for PHP code of tokenizer
Sarah Hoffmann [Wed, 28 Apr 2021 08:59:07 +0000 (10:59 +0200)]
boilerplate for PHP code of tokenizer

This adds an installation step for PHP code for the tokenizer. The
PHP code is split in two parts. The updateable code is found in
lib-php. The tokenizer installs an additional script in the
project directory which then includes the code from lib-php and
defines all settings that are static to the database. The website
code then always includes the PHP from the project directory.

3 years agotests for legacy tokenizer
Sarah Hoffmann [Wed, 28 Apr 2021 07:14:32 +0000 (09:14 +0200)]
tests for legacy tokenizer

3 years agomove amenity creation to tokenizer
Sarah Hoffmann [Tue, 27 Apr 2021 19:50:35 +0000 (21:50 +0200)]
move amenity creation to tokenizer

The BDD tests still use the old-style amenity creation scripts
because we don't have simple means to import a hand-crafted
test file of special phrases right now.

3 years agomove default country name creation to tokenizer
Sarah Hoffmann [Tue, 27 Apr 2021 09:37:18 +0000 (11:37 +0200)]
move default country name creation to tokenizer

The new function is also used, when a country us updated. All SQL
function related to country names have been removed.

3 years agocache all postcodes
Sarah Hoffmann [Mon, 26 Apr 2021 15:30:10 +0000 (17:30 +0200)]
cache all postcodes

3 years agoreorganise address iteration in tokenizer
Sarah Hoffmann [Mon, 26 Apr 2021 14:50:28 +0000 (16:50 +0200)]
reorganise address iteration in tokenizer

3 years agoremove debug code
Sarah Hoffmann [Sun, 25 Apr 2021 21:43:57 +0000 (23:43 +0200)]
remove debug code

3 years agouse address tokens in SQL
Sarah Hoffmann [Sun, 25 Apr 2021 21:42:56 +0000 (23:42 +0200)]
use address tokens in SQL

3 years agoextract address tokens in tokenizer
Sarah Hoffmann [Sun, 25 Apr 2021 20:04:07 +0000 (22:04 +0200)]
extract address tokens in tokenizer

3 years agomove postcode normalization into tokenizer
Sarah Hoffmann [Sun, 25 Apr 2021 16:26:36 +0000 (18:26 +0200)]
move postcode normalization into tokenizer

3 years agomove houseunumber handling to tokenizer
Sarah Hoffmann [Sun, 25 Apr 2021 09:47:29 +0000 (11:47 +0200)]
move houseunumber handling to tokenizer

Normalization and token computation are now done in the tokenizer.
The tokenizer keeps a cache to the hundred most used house numbers
to keep the numbers of calls to the database low.

3 years agomove name token creation into tokenizer
Sarah Hoffmann [Sun, 25 Apr 2021 08:38:29 +0000 (10:38 +0200)]
move name token creation into tokenizer

Name tokens are now handed in via token_info and used from there.

Also moves the generic search name insertion function back to
placex_triggers.sql.

3 years agointroduce name analyzer
Sarah Hoffmann [Sat, 24 Apr 2021 20:35:46 +0000 (22:35 +0200)]
introduce name analyzer

The name analyzer is the actual work horse of the tokenizer. It
is instantiated on a thread-base and provides all functions for
analysing names and queries.

3 years agorequire tokeinzer for indexer
Sarah Hoffmann [Sat, 24 Apr 2021 09:25:47 +0000 (11:25 +0200)]
require tokeinzer for indexer

3 years agointroduce index for finding surrounding buildings
Sarah Hoffmann [Fri, 23 Apr 2021 15:02:47 +0000 (17:02 +0200)]
introduce index for finding surrounding buildings

3 years agoadd extra column for tokenizer
Sarah Hoffmann [Fri, 23 Apr 2021 14:15:00 +0000 (16:15 +0200)]
add extra column for tokenizer

Add a jsonb column to the placex and location_property_osmline tables
which can be used by the installed tokenizer as required. No other
part of the software will use or otherwise rely on this column.

3 years agointroduce external processing in indexer
Sarah Hoffmann [Fri, 23 Apr 2021 13:49:38 +0000 (15:49 +0200)]
introduce external processing in indexer

Indexing is now split into three parts: first a preparation step
that collects the necessary information from the database and
returns it to Python. In a second step the data is transformed
within Python as necessary and then returned to the database
through the usual UPDATE which now not only sets the indexed_status
but also other fields. The third step comprises the address
computation which is still done inside the update trigger in
the database.

The second processing step doesn't do anything useful yet.

3 years agomove word table and normalisation SQL into tokenizer
Sarah Hoffmann [Thu, 22 Apr 2021 20:47:34 +0000 (22:47 +0200)]
move word table and normalisation SQL into tokenizer

Creating and populating the word table is now the responsibility
of the tokenizer.

The get_maxwordfreq() function has been replaced with a
simple template parameter to the SQL during function installation.
The number is taken from the parameter list in the database to
ensure that it is not changed after installation.

3 years agoadd migration for configurable tokenizer
Sarah Hoffmann [Wed, 21 Apr 2021 13:38:52 +0000 (15:38 +0200)]
add migration for configurable tokenizer

Adds a migration that initialises a legacy tokenizer for
an existing database. The migration is not active yet as
it will need completion when more functionality is added
to the legacy tokenizer.

3 years agomove module installation to legacy tokenizer
Sarah Hoffmann [Wed, 21 Apr 2021 13:00:37 +0000 (15:00 +0200)]
move module installation to legacy tokenizer

3 years agointroduce tokenizer modules
Sarah Hoffmann [Wed, 21 Apr 2021 07:57:17 +0000 (09:57 +0200)]
introduce tokenizer modules

This adds the boilerplate for selecting configurable tokenizers.
A tokenizer can be chosen at import time and will then install
itself such that it is fixed for the given database import even
when the software itself is updated.

The legacy tokenizer implements Nominatim's traditional algorithms.

3 years agoMerge pull request #2303 from lonvia/remove-aux-support
Sarah Hoffmann [Fri, 30 Apr 2021 09:19:35 +0000 (11:19 +0200)]
Merge pull request #2303 from lonvia/remove-aux-support

Remove support for AUX housenumber tables

3 years agoremove support for AUX housenumber tables
Sarah Hoffmann [Fri, 30 Apr 2021 08:08:29 +0000 (10:08 +0200)]
remove support for AUX housenumber tables

These tables have never been actively maintained and the code is
completely untested. With the upcomming changes, it is unlikely
that the code remains usable.

This removes the aux tables and all code that references them.

3 years agoMerge pull request #2299 from lonvia/update-actions
Sarah Hoffmann [Tue, 27 Apr 2021 10:18:45 +0000 (12:18 +0200)]
Merge pull request #2299 from lonvia/update-actions

Fix database check for reverse-only

3 years agoMerge pull request #2291 from AntoJvlt/special-phrases-statistics
Sarah Hoffmann [Tue, 27 Apr 2021 09:57:05 +0000 (11:57 +0200)]
Merge pull request #2291 from AntoJvlt/special-phrases-statistics

Special phrases statistics

3 years agodo not check for extra housenumber index for reverse-only
Sarah Hoffmann [Tue, 27 Apr 2021 08:14:26 +0000 (10:14 +0200)]
do not check for extra housenumber index for reverse-only

Also adds a database check for reverse only import to the CI.

3 years agoadd tests for different scripts
Sarah Hoffmann [Mon, 26 Apr 2021 21:01:06 +0000 (23:01 +0200)]
add tests for different scripts

3 years agoMerge pull request #2298 from lonvia/add-warming-to-ci
Sarah Hoffmann [Mon, 26 Apr 2021 09:21:44 +0000 (11:21 +0200)]
Merge pull request #2298 from lonvia/add-warming-to-ci

Add warming to CI import tests and fix more Python 3.5 compatibility issues

3 years agoavoid Path in subprocess parameters
Sarah Hoffmann [Mon, 26 Apr 2021 08:16:05 +0000 (10:16 +0200)]
avoid Path in subprocess parameters

Not supported by Python 3.5.

3 years agoadd warming to CI import test
Sarah Hoffmann [Mon, 26 Apr 2021 07:54:09 +0000 (09:54 +0200)]
add warming to CI import test

3 years agoSwitching to log info and only send warning for invalid phrases
AntoJvlt [Sun, 25 Apr 2021 15:56:12 +0000 (17:56 +0200)]
Switching to log info and only send warning for invalid phrases

3 years agoImplemented statistics for the import of special phrases through the SpecialPhrasesIm...
AntoJvlt [Thu, 22 Apr 2021 15:34:35 +0000 (17:34 +0200)]
Implemented statistics for the import of special phrases through the SpecialPhrasesImporterStatistics class

3 years agoreorganization of folder/file for the special phrases importer
AntoJvlt [Wed, 21 Apr 2021 15:11:57 +0000 (17:11 +0200)]
reorganization of folder/file for the special phrases importer

3 years agoMerge pull request #2297 from lonvia/update-deployment-docs
Sarah Hoffmann [Sat, 24 Apr 2021 13:35:00 +0000 (15:35 +0200)]
Merge pull request #2297 from lonvia/update-deployment-docs

docs: update deployment to use project directory

3 years agoMerge pull request #2296 from lonvia/disable-too-few-public-methods-check
Sarah Hoffmann [Sat, 24 Apr 2021 13:03:28 +0000 (15:03 +0200)]
Merge pull request #2296 from lonvia/disable-too-few-public-methods-check

pylint: disable too-few-public-methods check

3 years agodocs: update deployment to use project directory
Sarah Hoffmann [Sat, 24 Apr 2021 13:00:18 +0000 (15:00 +0200)]
docs: update deployment to use project directory

Fixes #2295.

3 years agofix pylint complaints
Sarah Hoffmann [Sat, 24 Apr 2021 09:44:36 +0000 (11:44 +0200)]
fix pylint complaints

3 years agopylint: disable check too-few-public-methods
Sarah Hoffmann [Sat, 24 Apr 2021 09:39:44 +0000 (11:39 +0200)]
pylint: disable check too-few-public-methods

3 years agoMerge pull request #2293 from darkshredder/update-manpage
Sarah Hoffmann [Sat, 24 Apr 2021 07:20:28 +0000 (09:20 +0200)]
Merge pull request #2293 from darkshredder/update-manpage

Updated manual page

3 years agoMerge pull request #2294 from lonvia/update-actions
Sarah Hoffmann [Fri, 23 Apr 2021 21:33:15 +0000 (23:33 +0200)]
Merge pull request #2294 from lonvia/update-actions

CI: add import test against Python 3.5 and fix discovered issues

3 years agoactions: add import on ubuntu 18.04
Sarah Hoffmann [Fri, 23 Apr 2021 13:45:54 +0000 (15:45 +0200)]
actions: add import on ubuntu 18.04

This uses oldest possible dependencies where possible.

3 years agoindexes with includes are not available for postgresql < 11
Sarah Hoffmann [Fri, 23 Apr 2021 20:27:12 +0000 (22:27 +0200)]
indexes with includes are not available for postgresql < 11

3 years agouse group() for regex matches
Sarah Hoffmann [Fri, 23 Apr 2021 20:18:55 +0000 (22:18 +0200)]
use group() for regex matches

Needed for compatibility with Python 3.5.

3 years agouse pathlib version of open
Sarah Hoffmann [Fri, 23 Apr 2021 19:57:05 +0000 (21:57 +0200)]
use pathlib version of open

3 years agosubprocess needs string argument
Sarah Hoffmann [Fri, 23 Apr 2021 19:49:41 +0000 (21:49 +0200)]
subprocess needs string argument

Compatibility change for Python 3.5.

3 years agocheck for existance of custom .env before opening
Sarah Hoffmann [Fri, 23 Apr 2021 19:42:24 +0000 (21:42 +0200)]
check for existance of custom .env before opening

3 years agouse more generic ImportError to check for module
Sarah Hoffmann [Fri, 23 Apr 2021 19:10:19 +0000 (21:10 +0200)]
use more generic ImportError to check for module

ModuleNotFoundError was only introduced in Python 3.6.

3 years agoreplace usages of fromisoformat() with strptime()
Sarah Hoffmann [Fri, 23 Apr 2021 18:53:00 +0000 (20:53 +0200)]
replace usages of fromisoformat() with strptime()

fromisoformat was only introduced with Python 3.7 while we
still support Python 3.5.

Fixes #2292.

3 years agoremove argparse dependency for vagrant scripts
Sarah Hoffmann [Fri, 23 Apr 2021 18:27:14 +0000 (20:27 +0200)]
remove argparse dependency for vagrant scripts

Users don't need to recreate the manpage.

3 years agoUpdated manual page
Darkshredder [Fri, 23 Apr 2021 20:12:38 +0000 (01:42 +0530)]
Updated manual page

3 years agobdd tests: fix place dependen ranking tests
Sarah Hoffmann [Thu, 22 Apr 2021 15:31:00 +0000 (17:31 +0200)]
bdd tests: fix place dependen ranking tests

The ranks of places may differ for some countries. Force the
place nodes in the test on null island which always uses the
default ranking.

3 years agoMerge pull request #2288 from RhinoDevel/patch-1
Sarah Hoffmann [Thu, 22 Apr 2021 15:12:25 +0000 (17:12 +0200)]
Merge pull request #2288 from RhinoDevel/patch-1

Replace "nominatim-update" with "nominatim".

3 years agoReplace "nominatim-update" with "nominatim".
RhinoDevel [Thu, 22 Apr 2021 13:40:22 +0000 (15:40 +0200)]
Replace "nominatim-update" with "nominatim".

If I am not mistaken, the correct command to index imported data via commandline is "nominatim index".

3 years agoindexer: reset query counter
Sarah Hoffmann [Wed, 21 Apr 2021 08:33:45 +0000 (10:33 +0200)]
indexer: reset query counter

Reset the counter for queries after the asynchronous connections
have been reopened.

3 years agoMerge pull request #2285 from lonvia/split-indexer-code
Sarah Hoffmann [Tue, 20 Apr 2021 13:34:14 +0000 (15:34 +0200)]
Merge pull request #2285 from lonvia/split-indexer-code

Rework indexer code

3 years agofactor out async connection handling into separate class
Sarah Hoffmann [Tue, 20 Apr 2021 09:16:12 +0000 (11:16 +0200)]
factor out async connection handling into separate class

Also adds a test for reconnecting regularly while indexing.

3 years agoindexer: make self.conn function-local
Sarah Hoffmann [Mon, 19 Apr 2021 16:15:09 +0000 (18:15 +0200)]
indexer: make self.conn function-local

Also switches to our internal connect function which gives us
a cursor with a sclar() function.

3 years agomake index() function private
Sarah Hoffmann [Mon, 19 Apr 2021 16:00:28 +0000 (18:00 +0200)]
make index() function private

3 years agomove analyse function into indexinf function
Sarah Hoffmann [Mon, 19 Apr 2021 15:34:26 +0000 (17:34 +0200)]
move analyse function into indexinf function

3 years agoindexer: move runner into separate file
Sarah Hoffmann [Mon, 19 Apr 2021 15:20:31 +0000 (17:20 +0200)]
indexer: move runner into separate file

3 years agoMerge pull request #2284 from lonvia/cleanup-word-frequency-computation
Sarah Hoffmann [Mon, 19 Apr 2021 16:28:04 +0000 (18:28 +0200)]
Merge pull request #2284 from lonvia/cleanup-word-frequency-computation

Rename and simplify function for word pre-computation

3 years agosimplify token precomputation
Sarah Hoffmann [Mon, 19 Apr 2021 14:54:22 +0000 (16:54 +0200)]
simplify token precomputation

Rename function to reflect that it is only used for precomputation.
The token IDs are not really needed, so don't bother to compute
the array of tokens.

3 years agoremove unused word recomputation script
Sarah Hoffmann [Mon, 19 Apr 2021 14:40:57 +0000 (16:40 +0200)]
remove unused word recomputation script

Has been replaced by a script recomputing counts from search_name.

3 years agoMerge pull request #2283 from darkshredder/tiger-data-test-fix
Sarah Hoffmann [Mon, 19 Apr 2021 11:56:36 +0000 (13:56 +0200)]
Merge pull request #2283 from darkshredder/tiger-data-test-fix

Fix: tiger-data tarfile test

3 years agoFix: tiger-data tarfile test
Darkshredder [Mon, 19 Apr 2021 10:23:01 +0000 (15:53 +0530)]
Fix: tiger-data tarfile test

3 years agoMerge pull request #2282 from lonvia/add-paths-to-config
Sarah Hoffmann [Mon, 19 Apr 2021 10:14:25 +0000 (12:14 +0200)]
Merge pull request #2282 from lonvia/add-paths-to-config

Include software paths in Python config object

3 years agosimplify sql and website creation functions
Sarah Hoffmann [Mon, 19 Apr 2021 08:01:17 +0000 (10:01 +0200)]
simplify sql and website creation functions

3 years agosimplify constructor for SQL preprocessor
Sarah Hoffmann [Mon, 19 Apr 2021 07:38:17 +0000 (09:38 +0200)]
simplify constructor for SQL preprocessor

Use sql path from config.

3 years agosimplify interface for adding tiger data
Sarah Hoffmann [Mon, 19 Apr 2021 07:23:37 +0000 (09:23 +0200)]
simplify interface for adding tiger data

Also simplifies tests using existing fixtures.

3 years agoadd library directories to config
Sarah Hoffmann [Mon, 19 Apr 2021 07:06:42 +0000 (09:06 +0200)]
add library directories to config

Allows to reduce the number of parameters in functions that take
the config anyway.

3 years agoMerge pull request #2281 from changpingc/changping/fix-tiger-index
Sarah Hoffmann [Mon, 19 Apr 2021 06:42:59 +0000 (08:42 +0200)]
Merge pull request #2281 from changpingc/changping/fix-tiger-index

fix index on location_property_tiger (parent_place_id)

3 years agofix index on location_property_tiger (parent_place_id)
Channgping Chen [Mon, 19 Apr 2021 00:01:01 +0000 (00:01 +0000)]
fix index on location_property_tiger (parent_place_id)

Looks like 2af82975cd968ec09683ae5b16a9aa157a7f2176
accidentally renamed an index. Because of the added "if not
exists" clause, the index doesn't get created. This
significantly slows down reverse queries because they now
require full scans on location_property_tiger.

Without this fix, reverse queries can take 8s on a full
planet install on an r5.8xlarge instance in EC2.

3 years agoMerge pull request #2280 from AntoJvlt/Fix-special-phrases-import-and-tests-cleaning
Sarah Hoffmann [Sun, 18 Apr 2021 09:57:19 +0000 (11:57 +0200)]
Merge pull request #2280 from AntoJvlt/Fix-special-phrases-import-and-tests-cleaning

Fix regex and sanity check for the import of special phrases and tests cleaning.