From: Sarah Hoffmann Date: Wed, 20 Nov 2019 10:53:26 +0000 (+0100) Subject: Merge remote-tracking branch 'upstream/master' X-Git-Tag: deploy~271 X-Git-Url: https://git.openstreetmap.org./nominatim.git/commitdiff_plain/0c483063dd7f5b62c28e10279e22250ec861fb8f?hp=0daf2d5cbf2e7ec558b5bc4dfe511e0f8f2d76b2 Merge remote-tracking branch 'upstream/master' --- diff --git a/docs/admin/Faq.md b/docs/admin/Faq.md index 485ba25b..2e5a7c43 100644 --- a/docs/admin/Faq.md +++ b/docs/admin/Faq.md @@ -26,14 +26,16 @@ If the reported rank is 26 or higher, you can also safely add `--index-noanalyse PHP Warning: file_get_contents(): open_basedir restriction in effect. -You need to adjust the [open_basedir](https://www.php.net/manual/en/ini.core.php#ini.open-basedir) setting -in your PHP configuration (`php.ini file`). By default this setting may look like this: +You need to adjust the +[open_basedir](https://www.php.net/manual/en/ini.core.php#ini.open-basedir) +setting in your PHP configuration (`php.ini` file). By default this setting may +look like this: open_basedir = /srv/http/:/home/:/tmp/:/usr/share/pear/ -Either add reported directories to the list or disable this setting temporarily by -dding ";" at the beginning of the line. Don't forget to enable this setting again -once you are done with the PHP command line operations. +Either add reported directories to the list or disable this setting temporarily +by adding ";" at the beginning of the line. Don't forget to enable this setting +again once you are done with the PHP command line operations. ### PHP timzeone warnings @@ -107,10 +109,11 @@ to get the full error message. `could not connect to server: No such file or directory` -On CentOS v7 the PostgreSQL server is started with `systemd`. -Check if `/usr/lib/systemd/system/httpd.service` contains a line `PrivateTmp=true`. -If so then Apache cannot see the `/tmp/.s.PGSQL.5432` file. It's a good security feature, -so use the [preferred solution](../appendix/Install-on-Centos-7/#adding-selinux-security-settings). +On CentOS v7 the PostgreSQL server is started with `systemd`. Check if +`/usr/lib/systemd/system/httpd.service` contains a line `PrivateTmp=true`. If +so then Apache cannot see the `/tmp/.s.PGSQL.5432` file. It's a good security +feature, so use the +[preferred solution](../appendix/Install-on-Centos-7/#adding-selinux-security-settings). However, you can solve this the quick and dirty way by commenting out that line and then run @@ -118,14 +121,12 @@ However, you can solve this the quick and dirty way by commenting out that line sudo systemctl restart httpd -### "must be an array or an object that implements Countable" warning in /usr/share/pear/DB.php - -The warning started with PHP 7.2. Make sure you have at least [version 1.9.3 of PEAR DB](https://github.com/pear/DB/releases) -installed. - ### Website reports "DB Error: insufficient permissions" -The user the webserver, e.g. Apache, runs under needs to have access to the Nominatim database. You can find the user like [this](https://serverfault.com/questions/125865/finding-out-what-user-apache-is-running-as), for default Ubuntu operating system for example it's `www-data`. +The user the webserver, e.g. Apache, runs under needs to have access to the +Nominatim database. You can find the user like +[this](https://serverfault.com/questions/125865/finding-out-what-user-apache-is-running-as), +for default Ubuntu operating system for example it's `www-data`. 1. Repeat the `createuser` step of the installation instructions. @@ -164,18 +165,8 @@ When running SELinux, make sure that the ### Setup.php fails with "DB Error: extension not found" Make sure you have the PostgreSQL extensions "hstore" and "postgis" installed. -See the installation instruction for a full list of required packages. - - -### Setup.php reports "Cannot redeclare getDB()" - -`Cannot redeclare getDB() (previously declared in /your/path/Nominatim/lib/db.php:4)` - -The message is a bit misleading as PHP needs to load the file `DB.php` and -instead re-loads Nominatim's `db.php`. To solve this make sure you -have the [Pear module 'DB'](https://pear.php.net/package/DB/) installed. +See the installation instructions for a full list of required packages. - sudo pear install DB ### I forgot to delete the flatnodes file before starting an import. diff --git a/docs/admin/Import-and-Update.md b/docs/admin/Import-and-Update.md index 757dab69..246fcdda 100644 --- a/docs/admin/Import-and-Update.md +++ b/docs/admin/Import-and-Update.md @@ -33,7 +33,7 @@ the directory exists. There should be at least 40GB of free space. ## Downloading additional data -### Wikipedia rankings +### Wikipedia/Wikidata rankings Wikipedia can be used as an optional auxiliary data source to help indicate the importance of OSM features. Nominatim will work without this information @@ -41,15 +41,13 @@ but it will improve the quality of the results if this is installed. This data is available as a binary download: cd $NOMINATIM_SOURCE_DIR/data - wget https://www.nominatim.org/data/wikipedia_article.sql.bin - wget https://www.nominatim.org/data/wikipedia_redirect.sql.bin + wget https://www.nominatim.org/data/wikimedia-importance.sql.gz -Combined the 2 files are around 1.5GB and add around 30GB to the install -size of Nominatim. They also increase the install time by an hour or so. +The file is about 400MB and adds around 4GB to Nominatim database. -*NOTE:* you'll need to download the Wikipedia rankings before performing -the initial import of the data if you want the rankings applied to the -loaded data. +*NOTE:* if you forgot to download the wikipedia rankings, you can also add +them after the import by running `./utils/setup.php --import-wikipedia-articles` +and then `./utils/update.php --recompute-importance`. ### Great Britain, USA postcodes diff --git a/docs/admin/Migration.md b/docs/admin/Migration.md index f3668357..e6b6d102 100644 --- a/docs/admin/Migration.md +++ b/docs/admin/Migration.md @@ -6,6 +6,21 @@ to newer versions of Nominatim. SQL statements should be executed from the PostgreSQL commandline. Execute `psql nominatim` to enter command line mode. +## 3.4.0 -> master + +### New Wikipedia/Wikidata importance tables + +The `wikipedia_*` tables have a new format that also includes references to +Wikidata. You need to update the computation functions and the tables as +follows: + + * download the new Wikipedia tables as described in the import section + * reimport the tables: `./utils/setup.php --import-wikipedia-articles` + * update the functions: `./utils/setup.php --create-functions --enable-diff-updates` + * compute importance: `./utils/update.php --recompute-importance` + +The last step takes about 10 hours on the full planet. + ## 3.3.0 -> 3.4.0 ### Reorganisation of location_area_country table diff --git a/docs/api/Output.md b/docs/api/Output.md index df78ed7d..a0d85220 100644 --- a/docs/api/Output.md +++ b/docs/api/Output.md @@ -46,9 +46,9 @@ a single place (for reverse) of the following format: The possible fields are: - * `place_id` - reference to the Nominatim internal database ID (see notes below) + * `place_id` - reference to the Nominatim internal database ID ([see notes](#place_id-is-not-a-persistent-id)) * `osm_type`, `osm_id` - reference to the OSM object - * `boundingbox` - area of corner coordinates + * `boundingbox` - area of corner coordinates ([see notes](#boundingbox)) * `lat`, `lon` - latitude and longitude of the centroid of the object * `display_name` - full comma-separated address * `class`, `type` - key and value of the main OSM tag @@ -75,7 +75,7 @@ a bounding box (`bbox`). The feature list has the following fields: - * `place_id` - reference to the Nominatim internal database ID (see notes below) + * `place_id` - reference to the Nominatim internal database ID ([see notes](#place_id-is-not-a-persistent-id)) * `osm_type`, `osm_id` - reference to the OSM object * `category`, `type` - key and value of the main OSM tag * `display_name` - full comma-separated address @@ -148,11 +148,11 @@ attribution to OSM and the original querystring. The place information can be found in the `result` element. The attributes of that element contain: - * `place_id` - reference to the Nominatim internal database ID (see notes below) + * `place_id` - reference to the Nominatim internal database ID ([see notes](#place_id-is-not-a-persistent-id)) * `osm_type`, `osm_id` - reference to the OSM object * `ref` - content of `ref` tag if it exists * `lat`, `lon` - latitude and longitude of the centroid of the object - * `boundingbox` - comma-separated list of corner coordinates + * `boundingbox` - comma-separated list of corner coordinates ([see notes](#boundingbox)) The full address of the result can be found in the content of the `result` element as a comma-separated list. @@ -203,11 +203,11 @@ generic information about the query: The place information can be found in the `place` elements, of which there may be more than one. The attributes of that element contain: - * `place_id` - reference to the Nominatim internal database ID (see notes below) + * `place_id` - reference to the Nominatim internal database ID ([see notes](#place_id-is-not-a-persistent-id)) * `osm_type`, `osm_id` - reference to the OSM object * `ref` - content of `ref` tag if it exists * `lat`, `lon` - latitude and longitude of the centroid of the object - * `boundingbox` - comma-separated list of corner coordinates + * `boundingbox` - comma-separated list of corner coordinates ([see notes](#boundingbox)) * `place_rank` - class search rank * `display_name` - full comma-separated address * `class`, `type` - key and value of the main OSM tag @@ -244,3 +244,10 @@ relation) so `osm_type`+`osm_id`+`class_name` would be more unique. Comma separated list of min latitude, max latitude, min longitude, max longitude. The whole planet would be `-90,90,-180,180`. + +Can we used to pan and center the map on the result, for example with leafletjs +mapping library +`map.fitBounds([[bbox[0],bbox[2]],[bbox[1],bbox[3]]], {padding: [20, 20], maxzoom: 16});` + +Bounds crossing the antimeridian have a min latitude -180 and max latitude 180, +essentially covering the planet (See [issue 184](https://github.com/openstreetmap/Nominatim/issues/184)). diff --git a/docs/develop/Documentation.md b/docs/develop/Documentation.md index 6e792c25..df8d2b1a 100644 --- a/docs/develop/Documentation.md +++ b/docs/develop/Documentation.md @@ -1,36 +1,36 @@ # Documentation Pages -The [Nominatim documentation](https://nominatim.org/release-docs/develop/) is built using the [MkDocs](https://www.mkdocs.org/) static site generation framework. The master branch is automatically deployed every night on under [https://nominatim.org/release-docs/develop/]() +The [Nominatim documentation](https://nominatim.org/release-docs/develop/) is built using the [MkDocs](https://www.mkdocs.org/) static site generation framework. The master branch is automatically deployed every night on under [https://nominatim.org/release-docs/develop/](https://nominatim.org/release-docs/develop/) -To preview local changes: +To preview local changes, first install MkDocs -1. Install MkDocs +``` +pip3 install --user mkdocs +``` - ``` - pip3 install --user mkdocs - ``` +Then go to the build directory and run -2. In build directory run +``` +make doc +INFO - Cleaning site directory +INFO - Building documentation to directory: /home/vagrant/build/site-html +``` - ``` - make doc - INFO - Cleaning site directory - INFO - Building documentation to directory: /home/vagrant/build/site-html - ``` +This runs `mkdocs build` plus extra transformation of some files and adds +symlinks (see `CMakeLists.txt` for the exact steps). - This runs `mkdocs build` plus extra transformion of some files and adds symlinks (see `CMakeLists.txt` for the exact steps). +Now you can start webserver for local testing +``` +build> mkdocs serve +[server:296] Serving on http://127.0.0.1:8000 +[handlers:62] Start watching changes +``` -3. Start webserver for local testing +If you develop inside a Vagrant virtual machine: - ``` - mkdocs serve - [server:296] Serving on http://127.0.0.1:8000 - [handlers:62] Start watching changes - ``` - - If you develop inside a Vagrant virtual machine: - * add port forwarding to your Vagrantfile, e.g. `config.vm.network "forwarded_port", guest: 8000, host: 8000` - * use `mkdocs serve --dev-addr 0.0.0.0:8000` because the default localhost - IP does not get forwarded. + * add port forwarding to your Vagrantfile, + e.g. `config.vm.network "forwarded_port", guest: 8000, host: 8000` + * use `mkdocs serve --dev-addr 0.0.0.0:8000` because the default localhost + IP does not get forwarded. diff --git a/docs/develop/Postcodes.md b/docs/develop/Postcodes.md new file mode 100644 index 00000000..ff36b0dd --- /dev/null +++ b/docs/develop/Postcodes.md @@ -0,0 +1,45 @@ +# Postcodes in Nominatim + +The blog post +[Nominatim and Postcodes](https://www.openstreetmap.org/user/lonvia/diary/43143) +describes the handling implemented since Nominatim 3.1. + +Postcode centroids (aka 'calculated postcodes') are generated by looking at all +postcodes of a country, grouping them and calculating the geometric centroid. +There is currently no logic to deal with extreme outliers (typos or other +mistakes in OSM data). There is also no check if a postcodes adheres to a +country's format, e.g. if Swiss postcodes are 4 digits. + + +## Regular updating calculated postcodes + +The script to rerun the calculation is +`build/utils/update.php --calculate-postcodes` +and runs once per night on nominatim.openstreetmap.org. + + +## Finding places that share a specific postcode + +In the Nominatim database run + +```sql +SELECT address->'postcode' as pc, + osm_type, osm_id, class, type, + st_x(centroid) as lon, st_y(centroid) as lat +FROM placex +WHERE country_code='fr' + AND upper(trim (both ' ' from address->'postcode')) = '33210'; +``` + +Alternatively on [Overpass](https://overpass-turbo.eu/) run the following query + +``` +[out:json][timeout:250]; +area["name"="France"]->.boundaryarea; +( +nwr(area.boundaryarea)["addr:postcode"="33210"]; +); +out body; +>; +out skel qt; +``` diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml index cb00e2e8..5cf63f83 100644 --- a/docs/mkdocs.yml +++ b/docs/mkdocs.yml @@ -22,6 +22,7 @@ pages: - 'Overview' : 'develop/overview.md' - 'OSM Data Import' : 'develop/Import.md' - 'Place Ranking' : 'develop/Ranking.md' + - 'Postcodes' : 'develop/Postcodes.md' - 'Documentation' : 'develop/Documentation.md' - 'External Data Sources': - 'Overview' : 'data-sources/overview.md' diff --git a/lib/DB.php b/lib/DB.php index fe2529b2..e4aa4349 100644 --- a/lib/DB.php +++ b/lib/DB.php @@ -284,7 +284,7 @@ class DB { // https://secure.php.net/manual/en/ref.pdo-pgsql.connection.php $aInfo = array(); - if (preg_match('/^pgsql:(.+)/', $sDSN, $aMatches)) { + if (preg_match('/^pgsql:(.+)$/', $sDSN, $aMatches)) { foreach (explode(';', $aMatches[1]) as $sKeyVal) { list($sKey, $sVal) = explode('=', $sKeyVal, 2); if ($sKey == 'host') $sKey = 'hostspec'; diff --git a/lib/cmd.php b/lib/cmd.php index 32fdc857..77878c15 100644 --- a/lib/cmd.php +++ b/lib/cmd.php @@ -148,12 +148,14 @@ function runSQLScript($sScript, $bfatal = true, $bVerbose = false, $bIgnoreError // Convert database DSN to psql parameters $aDSNInfo = \Nominatim\DB::parseDSN(CONST_Database_DSN); if (!isset($aDSNInfo['port']) || !$aDSNInfo['port']) $aDSNInfo['port'] = 5432; - $sCMD = 'psql -p '.$aDSNInfo['port'].' -d '.$aDSNInfo['database']; + $sCMD = 'psql' + .' -p '.escapeshellarg($aDSNInfo['port']) + .' -d '.escapeshellarg($aDSNInfo['database']); if (isset($aDSNInfo['hostspec']) && $aDSNInfo['hostspec']) { - $sCMD .= ' -h ' . $aDSNInfo['hostspec']; + $sCMD .= ' -h ' . escapeshellarg($aDSNInfo['hostspec']); } if (isset($aDSNInfo['username']) && $aDSNInfo['username']) { - $sCMD .= ' -U ' . $aDSNInfo['username']; + $sCMD .= ' -U ' . escapeshellarg($aDSNInfo['username']); } $aProcEnv = null; if (isset($aDSNInfo['password']) && $aDSNInfo['password']) { diff --git a/lib/setup/SetupClass.php b/lib/setup/SetupClass.php index a26b7dae..2fdb3926 100755 --- a/lib/setup/SetupClass.php +++ b/lib/setup/SetupClass.php @@ -80,13 +80,15 @@ class SetupFunctions fail('database already exists ('.CONST_Database_DSN.')'); } - $sCreateDBCmd = 'createdb -E UTF-8 -p '.$this->aDSNInfo['port'].' '.$this->aDSNInfo['database']; + $sCreateDBCmd = 'createdb -E UTF-8' + .' -p '.escapeshellarg($this->aDSNInfo['port']) + .' '.escapeshellarg($this->aDSNInfo['database']); if (isset($this->aDSNInfo['username'])) { - $sCreateDBCmd .= ' -U '.$this->aDSNInfo['username']; + $sCreateDBCmd .= ' -U '.escapeshellarg($this->aDSNInfo['username']); } if (isset($this->aDSNInfo['hostspec'])) { - $sCreateDBCmd .= ' -h '.$this->aDSNInfo['hostspec']; + $sCreateDBCmd .= ' -h '.escapeshellarg($this->aDSNInfo['hostspec']); } $result = $this->runWithPgEnv($sCreateDBCmd); @@ -158,13 +160,6 @@ class SetupFunctions if ($this->bNoPartitions) { $this->pgsqlRunScript('update country_name set partition = 0'); } - - // the following will be needed by createFunctions later but - // is only defined in the subsequently called createTables - // Create dummies here that will be overwritten by the proper - // versions in create-tables. - $this->pgsqlRunScript('CREATE TABLE IF NOT EXISTS place_boundingbox ()'); - $this->pgsqlRunScript('CREATE TYPE wikipedia_article_match AS ()', false); } public function importData($sOSMFile) @@ -178,30 +173,30 @@ class SetupFunctions fail("osm2pgsql not found in '$osm2pgsql'"); } - $osm2pgsql .= ' -S '.CONST_Import_Style; + $osm2pgsql .= ' -S '.escapeshellarg(CONST_Import_Style); if (!is_null(CONST_Osm2pgsql_Flatnode_File) && CONST_Osm2pgsql_Flatnode_File) { - $osm2pgsql .= ' --flat-nodes '.CONST_Osm2pgsql_Flatnode_File; + $osm2pgsql .= ' --flat-nodes '.escapeshellarg(CONST_Osm2pgsql_Flatnode_File); } if (CONST_Tablespace_Osm2pgsql_Data) - $osm2pgsql .= ' --tablespace-slim-data '.CONST_Tablespace_Osm2pgsql_Data; + $osm2pgsql .= ' --tablespace-slim-data '.escapeshellarg(CONST_Tablespace_Osm2pgsql_Data); if (CONST_Tablespace_Osm2pgsql_Index) - $osm2pgsql .= ' --tablespace-slim-index '.CONST_Tablespace_Osm2pgsql_Index; + $osm2pgsql .= ' --tablespace-slim-index '.escapeshellarg(CONST_Tablespace_Osm2pgsql_Index); if (CONST_Tablespace_Place_Data) - $osm2pgsql .= ' --tablespace-main-data '.CONST_Tablespace_Place_Data; + $osm2pgsql .= ' --tablespace-main-data '.escapeshellarg(CONST_Tablespace_Place_Data); if (CONST_Tablespace_Place_Index) - $osm2pgsql .= ' --tablespace-main-index '.CONST_Tablespace_Place_Index; + $osm2pgsql .= ' --tablespace-main-index '.escapeshellarg(CONST_Tablespace_Place_Index); $osm2pgsql .= ' -lsc -O gazetteer --hstore --number-processes 1'; - $osm2pgsql .= ' -C '.$this->iCacheMemory; - $osm2pgsql .= ' -P '.$this->aDSNInfo['port']; + $osm2pgsql .= ' -C '.escapeshellarg($this->iCacheMemory); + $osm2pgsql .= ' -P '.escapeshellarg($this->aDSNInfo['port']); if (isset($this->aDSNInfo['username'])) { - $osm2pgsql .= ' -U '.$this->aDSNInfo['username']; + $osm2pgsql .= ' -U '.escapeshellarg($this->aDSNInfo['username']); } if (isset($this->aDSNInfo['hostspec'])) { - $osm2pgsql .= ' -H '.$this->aDSNInfo['hostspec']; + $osm2pgsql .= ' -H '.escapeshellarg($this->aDSNInfo['hostspec']); } - $osm2pgsql .= ' -d '.$this->aDSNInfo['database'].' '.$sOSMFile; + $osm2pgsql .= ' -d '.escapeshellarg($this->aDSNInfo['database']).' '.escapeshellarg($sOSMFile); $this->runWithPgEnv($osm2pgsql); @@ -321,19 +316,14 @@ class SetupFunctions public function importWikipediaArticles() { - $sWikiArticlesFile = CONST_Wikipedia_Data_Path.'/wikipedia_article.sql.bin'; - $sWikiRedirectsFile = CONST_Wikipedia_Data_Path.'/wikipedia_redirect.sql.bin'; + $sWikiArticlesFile = CONST_Wikipedia_Data_Path.'/wikimedia-importance.sql.gz'; if (file_exists($sWikiArticlesFile)) { - info('Importing wikipedia articles'); - $this->pgsqlRunDropAndRestore($sWikiArticlesFile); - } else { - warn('wikipedia article dump file not found - places will have default importance'); - } - if (file_exists($sWikiRedirectsFile)) { - info('Importing wikipedia redirects'); - $this->pgsqlRunDropAndRestore($sWikiRedirectsFile); + info('Importing wikipedia articles and redirects'); + $this->pgExec('DROP TABLE IF EXISTS wikipedia_article'); + $this->pgExec('DROP TABLE IF EXISTS wikipedia_redirect'); + $this->pgsqlRunScriptFile($sWikiArticlesFile); } else { - warn('wikipedia redirect dump file not found - some place importance values may be missing'); + warn('wikipedia importance dump file not found - places will have default importance'); } } @@ -349,8 +339,6 @@ class SetupFunctions echo '.'; $this->pgExec('TRUNCATE place_addressline'); echo '.'; - $this->pgExec('TRUNCATE place_boundingbox'); - echo '.'; $this->pgExec('TRUNCATE location_area'); echo '.'; if (!$this->dbReverseOnly()) { @@ -599,13 +587,15 @@ class SetupFunctions public function index($bIndexNoanalyse) { $sOutputFile = ''; - $sBaseCmd = CONST_InstallPath.'/nominatim/nominatim -i -d '.$this->aDSNInfo['database'].' -P ' - .$this->aDSNInfo['port'].' -t '.$this->iInstances.$sOutputFile; + $sBaseCmd = CONST_InstallPath.'/nominatim/nominatim -i' + .' -d '.escapeshellarg($this->aDSNInfo['database']) + .' -P '.escapeshellarg($this->aDSNInfo['port']) + .' -t '.escapeshellarg($this->iInstances.$sOutputFile); if (isset($this->aDSNInfo['hostspec'])) { - $sBaseCmd .= ' -H '.$this->aDSNInfo['hostspec']; + $sBaseCmd .= ' -H '.escapeshellarg($this->aDSNInfo['hostspec']); } if (isset($this->aDSNInfo['username'])) { - $sBaseCmd .= ' -U '.$this->aDSNInfo['username']; + $sBaseCmd .= ' -U '.escapeshellarg($this->aDSNInfo['username']); } info('Index ranks 0 - 4'); @@ -740,22 +730,6 @@ class SetupFunctions } } - private function pgsqlRunDropAndRestore($sDumpFile) - { - $sCMD = 'pg_restore -p '.$this->aDSNInfo['port'].' -d '.$this->aDSNInfo['database'].' --no-owner -Fc --clean '.$sDumpFile; - if ($this->oDB->getPostgresVersion() >= 9.04) { - $sCMD .= ' --if-exists'; - } - if (isset($this->aDSNInfo['hostspec'])) { - $sCMD .= ' -h '.$this->aDSNInfo['hostspec']; - } - if (isset($this->aDSNInfo['username'])) { - $sCMD .= ' -U '.$this->aDSNInfo['username']; - } - - $this->runWithPgEnv($sCMD); - } - private function pgsqlRunScript($sScript, $bfatal = true) { runSQLScript( @@ -814,15 +788,17 @@ class SetupFunctions { if (!file_exists($sFilename)) fail('unable to find '.$sFilename); - $sCMD = 'psql -p '.$this->aDSNInfo['port'].' -d '.$this->aDSNInfo['database']; + $sCMD = 'psql' + .' -p '.escapeshellarg($this->aDSNInfo['port']) + .' -d '.escapeshellarg($this->aDSNInfo['database']); if (!$this->bVerbose) { $sCMD .= ' -q'; } if (isset($this->aDSNInfo['hostspec'])) { - $sCMD .= ' -h '.$this->aDSNInfo['hostspec']; + $sCMD .= ' -h '.escapeshellarg($this->aDSNInfo['hostspec']); } if (isset($this->aDSNInfo['username'])) { - $sCMD .= ' -U '.$this->aDSNInfo['username']; + $sCMD .= ' -U '.escapeshellarg($this->aDSNInfo['username']); } $aProcEnv = null; if (isset($this->aDSNInfo['password'])) { @@ -835,12 +811,12 @@ class SetupFunctions 1 => array('pipe', 'w'), 2 => array('file', '/dev/null', 'a') ); - $hGzipProcess = proc_open('zcat '.$sFilename, $aDescriptors, $ahGzipPipes); + $hGzipProcess = proc_open('zcat '.escapeshellarg($sFilename), $aDescriptors, $ahGzipPipes); if (!is_resource($hGzipProcess)) fail('unable to start zcat'); $aReadPipe = $ahGzipPipes[1]; fclose($ahGzipPipes[0]); } else { - $sCMD .= ' -f '.$sFilename; + $sCMD .= ' -f '.escapeshellarg($sFilename); $aReadPipe = array('pipe', 'r'); } $aDescriptors = array( diff --git a/lib/template/includes/html-header.php b/lib/template/includes/html-header.php index 942af095..9a914bf5 100644 --- a/lib/template/includes/html-header.php +++ b/lib/template/includes/html-header.php @@ -6,7 +6,6 @@ - diff --git a/sql/functions.sql b/sql/functions.sql index c7d96cdc..5d2b7a22 100644 --- a/sql/functions.sql +++ b/sql/functions.sql @@ -1358,10 +1358,9 @@ BEGIN END LOOP; NEW.importance := null; - select language||':'||title,importance from get_wikipedia_match(NEW.extratags, NEW.country_code) INTO NEW.wikipedia,NEW.importance; - IF NEW.importance IS NULL THEN - select language||':'||title,importance from wikipedia_article where osm_type = NEW.osm_type and osm_id = NEW.osm_id order by importance desc limit 1 INTO NEW.wikipedia,NEW.importance; - END IF; + SELECT wikipedia, importance + FROM compute_importance(NEW.extratags, NEW.country_code, NEW.osm_type, NEW.osm_id) + INTO NEW.wikipedia,NEW.importance; --DEBUG: RAISE WARNING 'Importance computed from wikipedia: %', NEW.importance; @@ -1600,9 +1599,10 @@ BEGIN -- mark the linked place (excludes from search results) UPDATE placex set linked_place_id = NEW.place_id where place_id = linkedPlacex.place_id; - -- keep a note of the node id in case we need it for wikipedia in a bit - linked_node_id := linkedPlacex.osm_id; - select language||':'||title,importance from get_wikipedia_match(linkedPlacex.extratags, NEW.country_code) INTO linked_wikipedia,linked_importance; + select wikipedia, importance + FROM compute_importance(linkedPlacex.extratags, NEW.country_code, + 'N', linkedPlacex.osm_id) + INTO linked_wikipedia,linked_importance; --DEBUG: RAISE WARNING 'Linked label member'; END LOOP; @@ -1639,9 +1639,10 @@ BEGIN -- mark the linked place (excludes from search results) UPDATE placex set linked_place_id = NEW.place_id where place_id = linkedPlacex.place_id; - -- keep a note of the node id in case we need it for wikipedia in a bit - linked_node_id := linkedPlacex.osm_id; - select language||':'||title,importance from get_wikipedia_match(linkedPlacex.extratags, NEW.country_code) INTO linked_wikipedia,linked_importance; + select wikipedia, importance + FROM compute_importance(linkedPlacex.extratags, NEW.country_code, + 'N', linkedPlacex.osm_id) + INTO linked_wikipedia,linked_importance; --DEBUG: RAISE WARNING 'Linked admin_center'; END IF; @@ -1684,9 +1685,10 @@ BEGIN -- mark the linked place (excludes from search results) UPDATE placex set linked_place_id = NEW.place_id where place_id = linkedPlacex.place_id; - -- keep a note of the node id in case we need it for wikipedia in a bit - linked_node_id := linkedPlacex.osm_id; - select language||':'||title,importance from get_wikipedia_match(linkedPlacex.extratags, NEW.country_code) INTO linked_wikipedia,linked_importance; + select wikipedia, importance + FROM compute_importance(linkedPlacex.extratags, NEW.country_code, + 'N', linkedPlacex.osm_id) + INTO linked_wikipedia,linked_importance; --DEBUG: RAISE WARNING 'Linked named place'; END LOOP; END IF; @@ -1714,13 +1716,6 @@ BEGIN (NEW.importance is null or NEW.importance < linked_importance) THEN NEW.importance = linked_importance; END IF; - - -- Still null? how about looking it up by the node id - IF NEW.importance IS NULL THEN - --DEBUG: RAISE WARNING 'Looking up importance by linked node id'; - select language||':'||title,importance from wikipedia_article where osm_type = 'N'::char(1) and osm_id = linked_node_id order by importance desc limit 1 INTO NEW.wikipedia,NEW.importance; - END IF; - END IF; -- make sure all names are in the word table @@ -2627,7 +2622,7 @@ END; $$ LANGUAGE plpgsql IMMUTABLE; -DROP TYPE wikipedia_article_match CASCADE; +DROP TYPE IF EXISTS wikipedia_article_match CASCADE; create type wikipedia_article_match as ( language TEXT, title TEXT, @@ -2684,6 +2679,42 @@ END; $$ LANGUAGE plpgsql; +DROP TYPE IF EXISTS place_importance CASCADE; +create type place_importance as ( + importance FLOAT, + wikipedia TEXT +); + +CREATE OR REPLACE FUNCTION compute_importance(extratags HSTORE, country_code varchar(2), osm_type varchar(1), osm_id BIGINT) + RETURNS place_importance + AS $$ +DECLARE + match RECORD; + result place_importance; +BEGIN + FOR match IN SELECT * FROM get_wikipedia_match(extratags, country_code) + WHERE language is not NULL + LOOP + result.importance := match.importance; + result.wikipedia := match.language || ':' || match.title; + RETURN result; + END LOOP; + + IF extratags ? 'wikidata' THEN + FOR match IN SELECT * FROM wikipedia_article + WHERE wd_page_title = extratags->'wikidata' + ORDER BY language = 'en' DESC, langcount DESC LIMIT 1 LOOP + result.importance := match.importance; + result.wikipedia := match.language || ':' || match.title; + RETURN result; + END LOOP; + END IF; + + RETURN null; +END; +$$ +LANGUAGE plpgsql; + CREATE OR REPLACE FUNCTION quad_split_geometry(geometry GEOMETRY, maxarea FLOAT, maxdepth INTEGER) RETURNS SETOF GEOMETRY AS $$ diff --git a/sql/tables.sql b/sql/tables.sql index 0559abd4..0245e3c3 100644 --- a/sql/tables.sql +++ b/sql/tables.sql @@ -268,7 +268,9 @@ CREATE TABLE wikipedia_article ( lon double precision, importance double precision, osm_type character(1), - osm_id bigint + osm_id bigint, + wd_page_title text, + instance_of text ); ALTER TABLE ONLY wikipedia_article ADD CONSTRAINT wikipedia_article_pkey PRIMARY KEY (language, title); CREATE INDEX idx_wikipedia_article_osm_id ON wikipedia_article USING btree (osm_type, osm_id); diff --git a/test/bdd/api/search/params.feature b/test/bdd/api/search/params.feature index 23a86705..cdea3f69 100644 --- a/test/bdd/api/search/params.feature +++ b/test/bdd/api/search/params.feature @@ -119,13 +119,13 @@ Feature: Search queries | en | Then result addresses contain | ID | state | - | 0 | Salto | + | 0 | Florida | When sending json search query "25 de Mayo" with address | accept-language | viewbox | - | en | -56.35879,-34.18330,-56.31618,-34.20815 | + | en | -57.95468,-31.39261,-57.94741,-31.39490 | Then result addresses contain | ID | state | - | 0 | Florida | + | 0 | Salto | Scenario: viewboxes cannot be points When sending json search query "foo" diff --git a/test/testdb/wikimedia-importance.sql.gz b/test/testdb/wikimedia-importance.sql.gz new file mode 100644 index 00000000..1024f725 Binary files /dev/null and b/test/testdb/wikimedia-importance.sql.gz differ diff --git a/test/testdb/wikipedia_article.sql.bin b/test/testdb/wikipedia_article.sql.bin deleted file mode 100644 index 628e2af4..00000000 Binary files a/test/testdb/wikipedia_article.sql.bin and /dev/null differ diff --git a/test/testdb/wikipedia_redirect.sql.bin b/test/testdb/wikipedia_redirect.sql.bin deleted file mode 100644 index 9c4b513d..00000000 Binary files a/test/testdb/wikipedia_redirect.sql.bin and /dev/null differ diff --git a/utils/update.php b/utils/update.php index 2809a198..f71c4f43 100644 --- a/utils/update.php +++ b/utils/update.php @@ -42,6 +42,7 @@ $aCMDOptions array('deduplicate', '', 0, 1, 0, 0, 'bool', 'Deduplicate tokens'), array('recompute-word-counts', '', 0, 1, 0, 0, 'bool', 'Compute frequency of full-word search terms'), array('update-address-levels', '', 0, 1, 0, 0, 'bool', 'Reimport address level configuration (EXPERT)'), + array('recompute-importance', '', 0, 1, 0, 0, 'bool', 'Recompute place importances'), array('no-npi', '', 0, 1, 0, 0, 'bool', '(obsolete)'), ); @@ -321,6 +322,23 @@ if ($aResult['update-address-levels']) { $oAlParser->createTable($oDB, 'address_levels'); } +if ($aResult['recompute-importance']) { + echo "Updating importance values for database.\n"; + $oDB = new Nominatim\DB(); + $oDB->connect(); + + $sSQL = 'ALTER TABLE placex DISABLE TRIGGER ALL;'; + $sSQL .= 'UPDATE placex SET (wikipedia, importance) ='; + $sSQL .= ' (SELECT wikipedia, importance'; + $sSQL .= ' FROM compute_importance(extratags, country_code, osm_type, osm_id));'; + $sSQL .= 'UPDATE placex s SET wikipedia = d.wikipedia, importance = d.importance'; + $sSQL .= ' FROM placex d'; + $sSQL .= ' WHERE s.place_id = d.linked_place_id and d.wikipedia is not null'; + $sSQL .= ' and (s.wikipedia is null or s.importance < d.importance);'; + $sSQL .= 'ALTER TABLE placex ENABLE TRIGGER ALL;'; + $oDB->exec($sSQL); +} + if ($aResult['import-osmosis'] || $aResult['import-osmosis-all']) { // if (strpos(CONST_Replication_Url, 'download.geofabrik.de') !== false && CONST_Replication_Update_Interval < 86400) { diff --git a/vagrant/Install-on-Centos-7.sh b/vagrant/Install-on-Centos-7.sh index 19b7ff84..b456fb74 100755 --- a/vagrant/Install-on-Centos-7.sh +++ b/vagrant/Install-on-Centos-7.sh @@ -17,12 +17,20 @@ sudo yum install -y epel-release +# More repositories for postgresql 11 (CentOS default 'postgresql' is 9.2), postgis +# and llvm-toolset (https://github.com/theory/pg-semver/issues/35) + + sudo yum install -y https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm + sudo yum install -y centos-release-scl-rh + # Now you can install all packages needed for Nominatim: #DOCS: :::sh - sudo yum install -y postgresql-server postgresql-contrib postgresql-devel \ - postgis postgis-utils \ + + sudo yum install -y postgresql11-server postgresql11-contrib postgresql11-devel \ + postgis25_11 postgis25_11-utils \ wget git cmake make gcc gcc-c++ libtool policycoreutils-python \ + devtoolset-7 llvm-toolset-7 \ php-pgsql php php-intl libpqxx-devel \ proj-epsg bzip2-devel proj-devel libxml2-devel boost-devel \ expat-devel zlib-devel @@ -77,8 +85,11 @@ sudo chown vagrant /srv/nominatim #DOCS: # CentOS does not automatically create a database cluster. Therefore, start # with initializing the database, then enable the server to start at boot: - sudo postgresql-setup initdb - sudo systemctl enable postgresql + echo 'PATH=$PATH:/usr/pgsql-11/bin' > .bash_profile + source .bash_profile + + sudo /usr/pgsql-11/bin/postgresql-11-setup initdb + sudo systemctl enable postgresql-11 # # Next tune the postgresql configuration, which is located in @@ -88,7 +99,7 @@ sudo chown vagrant /srv/nominatim #DOCS: # # Now start the postgresql service after updating this config file. - sudo systemctl restart postgresql + sudo systemctl restart postgresql-11 # # Finally, we need to add two postgres users: one for the user that does @@ -150,7 +161,7 @@ fi #DOCS: # download the country grid: if [ ! -f data/country_osm_grid.sql.gz ]; then #DOCS: :::sh - wget -O data/country_osm_grid.sql.gz https://www.nominatim.org/data/country_grid.sql.gz + wget --no-verbose -O data/country_osm_grid.sql.gz https://www.nominatim.org/data/country_grid.sql.gz fi #DOCS: # The code must be built in a separate directory. Create this directory,