Sarah Hoffmann [Fri, 23 Oct 2020 08:43:57 +0000 (10:43 +0200)]
minor fixes for geometry compuation during boundary ranking
Go back to using centroid when determining if one admin level
is within another. There are cases where boundaries are slightly
misaligned due to mapping errors (not using the same ways in the
relations).
Only declare boundaries the same when they have the same wikidata
tag _and_ have exactly the same geometry. This works around tagging
errors with the wikidata tag, which happen because of automated
edits to the wikidata tag.
Sarah Hoffmann [Tue, 20 Oct 2020 21:26:44 +0000 (23:26 +0200)]
detect and remove admin boundary duplicates
The Polish community maps admin boundaries that span multiple
levels by duplicating the boundary relations. Detect this situation
by looking out for matching wikidata tags. The higher ranked
duplicates are then thrown out from the address pool by setting
their address rank to 0.
Sarah Hoffmann [Thu, 22 Oct 2020 08:20:16 +0000 (10:20 +0200)]
adjust secondary order when no addressimportance available
In cases of countries and remote places without an address
it is possible that 'addressimportance' comes back empty.
Adjust the 'foundorder' to the places importance instead
in such cases.
Sarah Hoffmann [Tue, 20 Oct 2020 18:20:49 +0000 (20:20 +0200)]
reorganize ranks of high-level place types
Rank 25 is now available for places that should appear in addresses
but not when a street is present. Use this for som block-like
place types. Also document the particularity of rank 25.
subdevisions and allotments are now at the same level as landuse
which they are frequently used together with.
Sarah Hoffmann [Mon, 19 Oct 2020 08:39:01 +0000 (10:39 +0200)]
add explicit bbox contains check
Now that the containment check uses ST_Relate, we need to add
a separate bbox contains check to ensure that Postgis does the
efficient check first. Note that we still cannot get rid of the
overlap(&&) check because then Postgis will use the wrong indexes.
Sarah Hoffmann [Thu, 15 Oct 2020 15:30:52 +0000 (17:30 +0200)]
use computed centroid for location_area_large
The new address computation assumes that the centroid is inside
the area. Therefore we cannot use the centroid function. Use the
pre-computed centroid instead which has already been corrected to
be inside the area.
Sarah Hoffmann [Wed, 14 Oct 2020 09:33:47 +0000 (11:33 +0200)]
demote admin boundaries for place areas
Also demote the address rank of an admin boundary when there
is a place area of higher rank that completely contains the
area. This catches the case where city boundaries do not exactly
align with administrative units (see for example Moscow).
Sarah Hoffmann [Tue, 13 Oct 2020 20:10:07 +0000 (22:10 +0200)]
overhaul address computation
This is a complete rewrite of the selection of address parts to
be inserted into the place_addressline table.
The new algorithm selects for each rank:
* the boundary overlapping with the addressee and contained
in the already selected boundaries of lower rank, or failing that
* the place node closest to the addressee that is contained in
the already selected boundaries and in the influence radius
of already selected place nodes of lower rank
Place nodes that are not contained in already selected boundaries
of lower rank are completely thrown away. All other candidates are
added as non-address parts.
Sarah Hoffmann [Wed, 7 Oct 2020 15:33:52 +0000 (17:33 +0200)]
demote place nodes in admin areas
If a place node of city rank and above finds itself in an
administrative boundary of the same address rank, then
increase the address rank by 2. This catches the rather
frequent case where city suburbs are tagged for historical
reasons as towns or villages.
Sarah Hoffmann [Tue, 6 Oct 2020 12:00:43 +0000 (14:00 +0200)]
restrict postcode searches to postcode in first token
In structured queries we should only assume that it is
a postcode search when only the postcode and optionally
the country is given. If any other term is present, it
is better to avoid the search for postcode as it yields
too many bad searches. Given that the terms in a structured
query are ordered, this means that the postcode must be
the first token just like in the unstructured query.
Sarah Hoffmann [Mon, 5 Oct 2020 15:11:13 +0000 (17:11 +0200)]
update to latest osm2pgsql version
The latest version of osm2pgsql no longer creates indexes on
the members of planet_osm_rels. So we do that ourselves.
Given that we only need to access associated street relations,
the index can become quite a bit smaller.
Sarah Hoffmann [Tue, 22 Sep 2020 13:51:04 +0000 (15:51 +0200)]
remove ST_Covers check when also testing for ST_Intersects
Using both is slightly problematic because they have different
ways to use the index. Newer versions of Postgis exhibit a
query planner issue when both functions appear together.
As ST_Intersects includes ST_Covers, simply remove the latter.
Sarah Hoffmann [Wed, 23 Sep 2020 15:33:42 +0000 (17:33 +0200)]
use closest containing place for unlisted addr:place
We can't use getNearFeatures() to determine the parent of a
place with an unlisted addr:place because this function
returns place nodes that are potentially outside the area
of interest. Doing the complete address computation is too
expensive, so simply use the area with the largest rank that
contains the feature instead.
Sarah Hoffmann [Wed, 23 Sep 2020 09:55:18 +0000 (11:55 +0200)]
add unknown addr:place to address output
When a POI has no addr:street but an addr:place that is not
contained in the name list of the parent place, then remember
this situation and merge the content of addr:place into the
address output.
We don't need to care about translations in this case because
it is obvious that no object with translations exists if the
parent isn't the object named in addr:place.
Sarah Hoffmann [Tue, 22 Sep 2020 11:27:05 +0000 (13:27 +0200)]
exclude unnamed highway areas
These are used to mark large paved areas. Sometimes they exists
together with named regular streets. In such cases the unnamed
area may overshadow the actual street when computing the address
parent. As unnamed highways are not very useful anyway, we
simply remove them from the database.
Sarah Hoffmann [Thu, 3 Sep 2020 19:42:00 +0000 (21:42 +0200)]
always bind addr:place to place instead of street
If an addr:place is given but no addr:street tag, then bind
the rank 30 object always to a <=25 object, even when there
is none found with the same name.
Sarah Hoffmann [Thu, 3 Sep 2020 08:38:33 +0000 (10:38 +0200)]
merge addr tags into search_name table
When a place of rank 30 has addr tags that are not covered by the
search terms of the parent, add a separate entry for the POI in
the search_name table that includes the addr tags. We can only
do that with named places. For POIs without a name the housenumber
is used as name. If that is not available either, searching still
won't work.
Sarah Hoffmann [Sat, 19 Sep 2020 15:23:40 +0000 (17:23 +0200)]
ignore postcodes with colons
Colons are used as a delimiter in tiger:left and tiger:right tags
when multiple postcodes are given. Ignore those. This was already
done in the postcode update script. This changes just makes the
two places consistent where postcodes are added.
Sarah Hoffmann [Fri, 18 Sep 2020 13:09:35 +0000 (15:09 +0200)]
remove postcodes entirely from indexing
place=postcode places are artificial places that collect addr:postcode
points for aggration. They should neither show up in the address nor
be searchable. That means that there is no need to index them at all.
Only let boundary=postal_code through which define correct areas for
postcodes.
Sarah Hoffmann [Fri, 18 Sep 2020 09:08:47 +0000 (11:08 +0200)]
postal boundary may be imported without name
Postal boundaries usually just have the postcode tag set and are
therefore officially 'nameless'. We want to have them as
boundary=postal_code anyways in order to distiguish them from postcode
points inherited from addr: tags.
Sarah Hoffmann [Thu, 17 Sep 2020 16:17:01 +0000 (18:17 +0200)]
use place type of for result object in address parts
Boundaries shound derive the address part type from the
linked place if possible. This was already implemented
for the address objects but not for the address information
from the address itself.
Sarah Hoffmann [Thu, 17 Sep 2020 15:11:22 +0000 (17:11 +0200)]
make sure that all postcodes have an entry in word
It may happen that two different postcodes normalize to exactly
the same token. In that case we still need two different entries
in the word table. Token lookup will then make sure that the correct
one is choosen.
Sarah Hoffmann [Thu, 17 Sep 2020 07:54:46 +0000 (09:54 +0200)]
restructure developer's manual
Add a section on setting up the development environment which now
also includes the former chapter on recreating the documentation.
Move the README from test/ into the manual as the new Testing
chapter.