X-Git-Url: https://git.openstreetmap.org./nominatim.git/blobdiff_plain/e7b738fe35a2ec13efdc22c9f068c477182da6c3..1c7da4318298a790e63c5648de57b3b804426731:/docs/admin/Import-and-Update.md diff --git a/docs/admin/Import-and-Update.md b/docs/admin/Import-and-Update.md index 731ff8fa..554633ae 100644 --- a/docs/admin/Import-and-Update.md +++ b/docs/admin/Import-and-Update.md @@ -29,56 +29,178 @@ Add to your `settings/local.php`: @define('CONST_Osm2pgsql_Flatnode_File', '/path/to/flatnode.file'); Replace the second part with a suitable path on your system and make sure -the directory exists. There should be at least 40GB of free space. +the directory exists. There should be at least 64GB of free space. ## Downloading additional data -### Wikipedia rankings +### Wikipedia/Wikidata rankings Wikipedia can be used as an optional auxiliary data source to help indicate -the importance of osm features. Nominatim will work without this information +the importance of OSM features. Nominatim will work without this information but it will improve the quality of the results if this is installed. This data is available as a binary download: cd $NOMINATIM_SOURCE_DIR/data - wget https://www.nominatim.org/data/wikipedia_article.sql.bin - wget https://www.nominatim.org/data/wikipedia_redirect.sql.bin + wget https://www.nominatim.org/data/wikimedia-importance.sql.gz -Combined the 2 files are around 1.5GB and add around 30GB to the install -size of nominatim. They also increase the install time by an hour or so. +The file is about 400MB and adds around 4GB to Nominatim database. -*NOTE:* you'll need to download the Wikipedia rankings before performing -the initial import of the data if you want the rankings applied to the -loaded data. +!!! tip + If you forgot to download the wikipedia rankings, you can also add + importances after the import. Download the files, then run + `./utils/setup.php --import-wikipedia-articles` + and `./utils/update.php --recompute-importance`. -### UK postcodes +### Great Britain, USA postcodes -Nominatim can use postcodes from an external source to improve searches that involve a UK postcode. This data can be optionally downloaded: +Nominatim can use postcodes from an external source to improve searches that +involve a GB or US postcode. This data can be optionally downloaded: cd $NOMINATIM_SOURCE_DIR/data wget https://www.nominatim.org/data/gb_postcode_data.sql.gz + wget https://www.nominatim.org/data/us_postcode_data.sql.gz +## Choosing the Data to Import + +In its default setup Nominatim is configured to import the full OSM data +set for the entire planet. Such a setup requires a powerful machine with +at least 64GB of RAM and around 800GB of SSD hard disks. Depending on your +use case there are various ways to reduce the amount of data imported. This +section discusses these methods. They can also be combined. + +### Using an extract + +If you only need geocoding for a smaller region, then precomputed extracts +are a good way to reduce the database size and import time. +[Geofabrik](https://download.geofabrik.de) offers extracts for most countries. +They even have daily updates which can be used with the update process described +below. There are also +[other providers for extracts](https://wiki.openstreetmap.org/wiki/Planet.osm#Downloading). + +Please be aware that some extracts are not cut exactly along the country +boundaries. As a result some parts of the boundary may be missing which means +that Nominatim cannot compute the areas for some administrative areas. + +### Dropping Data Required for Dynamic Updates + +About half of the data in Nominatim's database is not really used for serving +the API. It is only there to allow the data to be updated from the latest +changes from OSM. For many uses these dynamic updates are not really required. +If you don't plan to apply updates, the dynamic part of the database can be +safely dropped using the following command: + +``` +./utils/setup.php --drop +``` + +Note that you still need to provide for sufficient disk space for the initial +import. So this option is particularly interesting if you plan to transfer the +database or reuse the space later. + +### Reverse-only Imports + +If you only want to use the Nominatim database for reverse lookups or +if you plan to use the installation only for exports to a +[photon](https://photon.komoot.de/) database, then you can set up a database +without search indexes. Add `--reverse-only` to your setup command above. + +This saves about 5% of disk space. + +### Filtering Imported Data + +Nominatim normally sets up a full search database containing administrative +boundaries, places, streets, addresses and POI data. There are also other +import styles available which only read selected data: + +* **settings/import-admin.style** + Only import administrative boundaries and places. +* **settings/import-street.style** + Like the admin style but also adds streets. +* **settings/import-address.style** + Import all data necessary to compute addresses down to house number level. +* **settings/import-full.style** + Default style that also includes points of interest. +* **settings/import-extratags.style** + Like the full style but also adds most of the OSM tags into the extratags + column. + +The style can be changed with the configuration `CONST_Import_Style`. + +To give you an idea of the impact of using the different styles, the table +below gives rough estimates of the final database size after import of a +2018 planet and after using the `--drop` option. It also shows the time +needed for the import on a machine with 64GB RAM, 4 CPUS and SSDs. Note that +the given sizes are just an estimate meant for comparison of style requirements. +Your planet import is likely to be larger as the OSM data grows with time. + +style | Import time | DB size | after drop +----------|--------------|------------|------------ +admin | 5h | 190 GB | 20 GB +street | 42h | 400 GB | 180 GB +address | 59h | 500 GB | 260 GB +full | 80h | 575 GB | 300 GB +extratags | 80h | 585 GB | 310 GB + +You can also customize the styles further. For a description of the +style format see [the development section](../develop/Import.md). ## Initial import of the data -**Important:** first try the import with a small excerpt, for example from -[Geofabrik](https://download.geofabrik.de). +!!! danger "Important" + First try the import with a small extract, for example from + [Geofabrik](https://download.geofabrik.de). -Download the data to import and load the data with the following command: +Download the data to import and load the data with the following command +from the build directory: ```sh -./utils/setup.php --osm-file --all [--osm2pgsql-cache 28000] 2>&1 | tee setup.log +./utils/setup.php --osm-file --all 2>&1 | tee setup.log ``` -The `--osm2pgsql-cache` parameter is optional but strongly recommended for -planet imports. It sets the node cache size for the osm2pgsql import part -(see `-C` parameter in osm2pgsql help). As a rule of thumb, this should be -about the same size as the file you are importing but never more than -2/3 of RAM available. If your machine starts swapping reduce the size. +***Note for full planet imports:*** Even on a perfectly configured machine +the import of a full planet takes at least 2 days. Once you see messages +with `Rank .. ETA` appear, the indexing process has started. This part takes +the most time. There are 30 ranks to process. Rank 26 and 30 are the most complex. +They take each about a third of the total import time. If you have not reached +rank 26 after two days of import, it is worth revisiting your system +configuration as it may not be optimal for the import. + +### Notes on memory usage + +In the first step of the import Nominatim uses osm2pgsql to load the OSM data +into the PostgreSQL database. This step is very demanding in terms of RAM usage. +osm2pgsql and PostgreSQL are running in parallel at this point. PostgreSQL +blocks at least the part of RAM that has been configured with the +`shared_buffers` parameter during [PostgreSQL tuning](Installation#postgresql-tuning) +and needs some memory on top of that. osm2pgsql needs at least 2GB of RAM for +its internal data structures, potentially more when it has to process very large +relations. In addition it needs to maintain a cache for node locations. The size +of this cache can be configured with the parameter `--osm2pgsql-cache`. -Computing word frequency for search terms can improve the performance of -forward geocoding in particular under high load as it helps Postgres' query -planner to make the right decisions. To recompute word counts run: +When importing with a flatnode file, it is best to disable the node cache +completely and leave the memory for the flatnode file. Nominatim will do this +by default, so you do not need to configure anything in this case. + +For imports without a flatnode file, set `--osm2pgsql-cache` approximately to +the size of the OSM pbf file (in MB) you are importing. Make sure you leave +enough RAM for PostgreSQL and osm2pgsql as mentioned above. If the system starts +swapping or you are getting out-of-memory errors, reduce the cache size or +even consider using a flatnode file. + +### Verify import finished + +Run this script to verify all required tables and indices got created successfully. + +```sh +./utils/check_import_finished.php +``` + + +## Tuning the database + +Accurate word frequency information for search terms helps PostgreSQL's query +planner to make the right decisions. Recomputing them can improve the performance +of forward geocoding in particular under high load. To recompute word counts run: ```sh ./utils/update.php --recompute-word-counts @@ -96,74 +218,61 @@ you also need to enable these key phrases like this: ./utils/specialphrases.php --wiki-import > specialphrases.sql psql -d nominatim -f specialphrases.sql -Note that this command downloads the phrases from the wiki link above. +Note that this command downloads the phrases from the wiki link above. You +need internet access for the step. ## Installing Tiger housenumber data for the US -Nominatim is able to use the official TIGER address set to complement the -OSM house number data in the US. You can add TIGER data to your own Nominatim -instance by following these steps: - - 1. Install the GDAL library and python bindings and the unzip tool +Nominatim is able to use the official [TIGER](https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html) +address set to complement the OSM house number data in the US. You can add +TIGER data to your own Nominatim instance by following these steps. The +entire US adds about 10GB to your database. - * Ubuntu: `sudo apt-get install python-gdal unzip` - * CentOS: `sudo yum install gdal-python unzip` - - 2. Get preprocessed TIGER 2017 data and unpack it into the + 1. Get preprocessed TIGER 2019 data and unpack it into the data directory in your Nominatim sources: cd Nominatim/data - wget https://nominatim.org/data/tiger2017-nominatim-preprocessed.tar.gz - tar xf tiger2017-nominatim-preprocessed.tar.gz + wget https://nominatim.org/data/tiger2019-nominatim-preprocessed.tar.gz + tar xf tiger2019-nominatim-preprocessed.tar.gz + + `data-source/us-tiger/README.md` explains how the data got preprocessed. - 3. Import the data into your Nominatim database: + 2. Import the data into your Nominatim database: ./utils/setup.php --import-tiger-data - 4. Enable use of the Tiger data in your `settings/local.php` by adding: + 3. Enable use of the Tiger data in your `settings/local.php` by adding: @define('CONST_Use_US_Tiger_Data', true); - 5. Apply the new settings: + 4. Apply the new settings: ```sh ./utils/setup.php --create-functions --enable-diff-updates --create-partition-functions ``` -The entire US adds about 10GB to your database. - -You can also process the data from the original TIGER data to create the -SQL files, Nominatim needs for the import: - - 1. Get the TIGER 2017 data. You will need the EDGES files - (3,234 zip files, 11GB total). - - wget -r ftp://ftp2.census.gov/geo/tiger/TIGER2017/EDGES/ - - 2. Convert the data into SQL statements: - - ./utils/imports.php --parse-tiger - -Be warned that this can take quite a long time. After this process is finished, -the same preprocessed files as above are available in `data/tiger`. ## Updates -There are many different possibilities to update your Nominatim database. +There are many different ways to update your Nominatim database. The following section describes how to keep it up-to-date with Pyosmium. For a list of other methods see the output of `./utils/update.php --help`. +!!! warning + If you have configured a flatnode file for the import, then you + need to keep this flatnode file around for updates as well. + #### Installing the newest version of Pyosmium -It is recommended to install Pyosmium via pip. Run (as the same user who -will later run the updates): +It is recommended to install Pyosmium via pip. Make sure to use python3. +Run (as the same user who will later run the updates): ```sh -pip install --user osmium +pip3 install --user osmium ``` -Nominatim needs a tool called `pyosmium-get-updates`, which comes with +Nominatim needs a tool called `pyosmium-get-updates` which comes with Pyosmium. You need to tell Nominatim where to find it. Add the following line to your `settings/local.php`: @@ -179,7 +288,7 @@ to update using the global minutely diffs. If you want a different update source you will need to add some settings to `settings/local.php`. For example, to use the daily country extracts -diffs for Ireland from geofabrik add the following: +diffs for Ireland from Geofabrik add the following: // base URL of the replication service @define('CONST_Replication_Url', 'https://download.geofabrik.de/europe/ireland-and-northern-ireland-updates'); @@ -195,7 +304,7 @@ To set up the update process now run the following command: It outputs the date where updates will start. Recheck that this date is what you expect. -The --init-updates command needs to be rerun whenever the replication service +The `--init-updates` command needs to be rerun whenever the replication service is changed. #### Updating Nominatim @@ -204,7 +313,9 @@ The following command will keep your database constantly up to date: ./utils/update.php --import-osmosis-all -(Note that even though the old name "import-osmosis-all" has been kept for compatibility reasons, Osmosis is not required to run this - it uses pyosmium behind the scenes.) +(Note that even though the old name "import-osmosis-all" has been kept for +compatibility reasons, Osmosis is not required to run this - it uses pyosmium +behind the scenes.) If you have imported multiple country extracts and want to keep them up-to-date, have a look at the script in