Simplistic spatial/administrative referential.
Pour une documentation relative aux niveaux administratifs français, veuillez consulter le fichier LISEZMOI.md.
This project is a set of tools to produce a shared spatial/administrative referential based on open datasets.
The purpose is to be embeddable in applications for autocompletion. There is no purpose of universality (country levels are not comparable) nor precision (most sourced datasets have a 100m precision).
These tools work on and exports WGS84 spatial data.
This project uses MongoDB 2.6+ and GDAL as main tooling. Build tools are written in Python 3 and make use of:
- click
- PyMongo
- Fiona
- Shapely
The web interface requires Flask.
Translations requires Babel and Transifex client.
There are many way of getting a development environment started.
Assuming you have Virtualenv and MongoDB installed and configured on you computer:
$ git clone https://github.com/etalab/geozones.git
$ cd geozones
$ virtualenv -p /bin/python3 .
$ source bin/activate
$ pip install -e .
$ geozones -h
There is a docker-compose.yml
file providing a MongoDB instance. You can also run the entire tool into docker. See Using docker for more details.
There are two main models:
- level hierarchies
- zone/territories
GeoZones use MongoDB as working storage.
They define relationships between levels and their names. They are not stored into the database but they are exported with the following properties:
Property | Description |
---|---|
id | A string identifier for the level (ie. country , fr:commune ...) |
label | The humain string representation in English (ie. World ). * |
admin_level | An administrative scale index (0 is the biggest and 100 the smallest level) |
parents | The list of known parent levels identifier |
*: Labels are optionally translatables
You can contribute your country specific levels. Currently geozones support the following levels:
identifier | administrative level | description |
---|---|---|
country-group |
10 | Groups of countries (World , UE ...) |
contry |
20 | A country |
country-subset |
30 | An administrative subset of a country |
identifier | administrative level | description |
---|---|---|
fr:region |
40 | Regions of France |
fr:epci |
68 | Intercommunality of France |
fr:departement |
60 | Departements of France |
fr:collectivite |
60 | French overseas collectivities |
fr:arrondissement |
70 | Arrondissements of France |
fr:commune |
80 | Communes of France |
fr:canton |
98 | Cantons of France |
fr:iris |
98 | Iris of France |
identifier | administrative level | description |
---|---|---|
lu:district |
40 | District of Luxembourg |
lu:canton |
60 | Canton of France |
lu:commune |
80 | Communes of France |
A zone is a spatial polygon for a given level. It has at least one unique code (unique on its level) and a name. It can have many known keys, that are not necessarily unique (ie. postal codes can be shared by many towns).
Labels are optionally translatable.
Some zones are defined as an aggregation of other zones. They are called aggregation in geozones and built after all data are loaded.
The following properties are exported in the GeoJSON output:
Property | Description |
---|---|
id | A unique identifier defined by <level>:<code>[@creation] |
code | The zone unique identifier in this level |
level | The level identifier |
name | The zone display name (can be translatable) |
population | Estimated/approximative population (optional) |
area | Estimated/approximative area in km² (optional) |
wikidata | A Wikidata node identifier (optional) |
wikipedia | A Wikipedia reference (optional) |
dbpedia | A DBPedia reference (optional) |
flag | A DBPedia reference to a flag (optional) |
blazon | A DBPedia reference to a blazon (optional) |
keys | A dictionary of known keys/code for this zone |
parents | A list of every known parent zone identifier |
ancestors | A list of ancestors (optional) |
successors | A list of successors (optional) |
validity | A date range validity (start /end ) (optional) |
Note that you can choose via the keys option which properties you would like to export during the
dist
ribution step.
Level names and some territories are translatable. They are provided as gettext files. Translations are handled on transifex.
Here’s the workflow:
# Ensure you have the optionnal tools to process translations
$ pip install -e .[i18n]
# Extract translatabls labels
$ pybabel extract -F babel.cfg -o geozones/translations/geozones.pot .
# Push updated translations template to Transifex
$ tx push -s
# Fetch last translations from Transifex
$ tx pull
# Compile translations for packaging/distribution
$
To add an extra language:
$ pybabel init -D geozones -i geozones/translations/geozones.pot -d geozones/translations -l <language code>
$ tx push -t -l <language code>
A set of commands are provided for the build process. You can list them all with:
$ geozones --help
Download the required datasets. Datasets will be stored into a downloads
subdirectory.
Load and process datasets into database.
Perform zones aggregations for zones defined as aggregation of others.
Perform some non geospatial processing (ex: set the postal codes, attach the parents…).
--exclude
and --only
options make possible to run a set of postprocess function(s).
Dump the produced dataset as GeoJSON files for distribution. Files are dumped in a build subdirectory.
All in one task equivalent to:
# Perform all tasks from download to distibution
$ geozones download preload load aggregate postprocess dist
Serve a web interface to explore the generated data.
Display some useful informations and statistics.
Commands are chainable so you can write:
# Perform all tasks from download to distibution
$ geozones download load -d aggregate postprocess dist dist -s status
Generate a datasets donwload list for external usage.
This allows using an external download manager by example.
Ex: using 10 parallels threads with curl:
mkdir download && cd download && geozones sourceslist | xargs -P 10 -n 1 curl -O
Fetch zones logos/flags/blazons from Wikipedia when available.
You can export data in (Geo)JSON or msgpack formats.
The msgpack format consumes more CPU on deserialization but does not take many gigabytes of RAM given that it can iterate over data without loading the whole file.
- NaturalEarth administrative boundaries
- The Matic Mapping country boundaries
- OpenStreetMap french regions boundaries
- OpenStreetMap french counties boundaries
- OpenStreetMap french EPCIs boundaries
- OpenStreetMap french districts boundaries
- OpenStreetMap french towns boundaries
- OpenStreetMap french cantons boundaries
- IGN/ISEE IRIS aggregated version
- French postal codes database
If you only want a MongoDB instance in docker and continue using a native Python environment, just use the provided docker-compose.yml
as it is:
docker-compose up -d
Your MongoDB instance will be available on localhost:27017.
If you want to run the entire application within docker, you can use a docker-compose.override.yml
to add an extra docker instance for geozones
.
A sample docker-compose.override.yml
is provided in docker-compose.geozones.yml
.
cp docker-compose.{geozones,override}.yml
docker-compose up -d
Your MongoDB instance will be available on localhost:27017 and the explore interface localhost:5000.
Then you can run any geozones command with docker-compose run geozones <command>
.
Ex:
docker-compose run geozones status
- Incremental downloads, maybe with checksum check
- Global post-processor
- Post-processor dependencies
- Audit trail
- Distribute GeoZone as a standalone python executable
- Some quality check tools
- Global weight = f(population, area, level)
- Different precision output
- Localized JSON outputs (Output are english only right now)
- Translations as distributable JSON (as an alternative to the current PO/MO format)
- Translations as Python package
- Model versioning
- Statistics/coverages in levels
- Querying
- Only fetch zones for viewport (less intensive for lower layers)
- A full web-service as a separate project