Skip to content

Add fiboa improve command #79 #21 #114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Nov 15, 2024
Merged
21 changes: 20 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,29 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

### Added

- Command `fiboa improve` with helpers to
- change the CRS
- change the GeoParquet version and compression
- fill missing perimeter/area values
- fix invalid geometries
- rename columns
- Converter for Lithuania (EuroCrops)
- Converter for Switzerland
- Converter for Slovenia
- Converter for Slovakia
- Converter for Switzerland
- `fiboa convert`: New parameter `--original-geometries` / `-og` to keep the original geometries

### Changed

- `fiboa convert`:
- Writes custom schemas to collection metadata
- Geometries are made valid using GeoPanda's `make_valid` method by default
- MultiPolygons are converted to Polygons by default
- `fiboa validate` uses custom schemas for validation
- `fiboa merge` keeps custom schemas when needed

### Removed
- `fiboa convert`: Removed the explicit parameter `explode_multipolygon` from the converter

### Fixed

Expand Down
30 changes: 26 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,20 @@ A command-line interface (CLI) for working with fiboa.

## Getting Started

In order to make working with fiboa easier we have developed command-line interface (CLI) tools such as
In order to make working with fiboa easier we have developed command-line interface (CLI) tools such as
inspection, validation and file format conversions.

### Installation

You will need to have **Python 3.9** or any later version installed.
You will need to have **Python 3.9** or any later version installed.

Run `pip install fiboa-cli` in the CLI to install the validator.

**Optional:** To install additional dependencies for specific [converters](#converter-for-existing-datasets),
you can for example run: `pip install fiboa-cli[xyz]` with xyz being the converter name.

**Note on versions:**

- fiboa CLI >= 0.3.0 works with fiboa version > 0.2.0
- fiboa CLI < 0.3.0 works with fiboa version = 0.1.0

Expand All @@ -44,6 +45,7 @@ fiboa CLI supports various commands to work with the files:
- [Merge fiboa GeoParquet files](#merge-fiboa-geoparquet-files)
- [Create JSON Schema from fiboa Schema](#create-json-schema-from-fiboa-schema)
- [Validate a fiboa Schema](#validate-a-fiboa-schema)
- [Improve a fiboa Parquet file](#improve-a-fiboa-parquet-file)
- [Update an extension template with new names](#update-an-extension-template-with-new-names)
- [Converter for existing datasets](#converter-for-existing-datasets)
- [Development](#development)
Expand Down Expand Up @@ -121,19 +123,38 @@ To validate a fiboa Schema YAML file, you can for example run:

Check `fiboa validate-schema --help` for more details.

### Improve a fiboa Parquet file

Various "improvements" can be applied to a fiboa GeoParquet file.
The commands allows to

- change the CRS (`--crs`)
- change the GeoParquet version (`-gp1`) and compression (`-pc`)
- add/fill missing perimeter/area values (`-sz`)
- fix invalid geometries (`-g`)
- rename columns (`-r`)

Example:

- `fiboa improve file.parquet -o file2.parquet -g -sz -r old=new -pc zstd`

Check `fiboa improve --help` for more details.

### Update an extension template with new names

Once you've created and git cloned a new extension, you can use the CLI
to update all template placeholders with proper names.

For example, if your extension is meant to have
- the title "Timestamps Extension",

- the title "Timestamps Extension",
- the prefix `ts` (e.g. field `ts:created` or `ts:updated`),
- is hosted at `https://github.io/fiboa/timestamps-extension`
(organization: `fiboa`, repository `timestamps-extension`),
- and you run fiboa in the folder of the extension.

Then the following command could be used:

- `fiboa rename-extension . -t Timestamps -p ts -s timestamps-extension -o fiboa`

Check `fiboa rename-extension --help` for more details.
Expand All @@ -143,13 +164,14 @@ Check `fiboa rename-extension --help` for more details.
The CLI ships various converters for existing datasets.

To get a list of available converters/datasets with title, license, etc. run:

- `fiboa converters`

Use any of the IDs from the list to convert an existing dataset to fiboa:

- `fiboa convert de_nrw`

See [Implement a converter](#implement-a-converter) for details about how to
See [Implement a converter](#implement-a-converter) for details about how to

## Development

Expand Down
98 changes: 89 additions & 9 deletions fiboa_cli/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,18 @@
import click
import pandas as pd

from .const import COMPRESSION_METHODS, CORE_COLUMNS
from .convert import convert as convert_
from .convert import list_all_converter_ids, list_all_converters
from .create_geojson import create_geojson as create_geojson_
from .create_geoparquet import create_geoparquet as create_geoparquet_
from .describe import describe as describe_
from .merge import merge as merge_, DEFAULT_COLUMNS, DEFAULT_CRS
from .improve import improve as improve_
from .merge import merge as merge_, DEFAULT_CRS
from .jsonschema import jsonschema as jsonschema_
from .rename_extension import rename_extension as rename_extension_
from .util import (check_ext_schema_for_cli, log, parse_converter_input_files,
valid_file_for_cli, valid_file_for_cli_with_ext,
parse_map, valid_file_for_cli, valid_file_for_cli_with_ext,
valid_files_folders_for_cli, valid_folder_for_cli)
from .validate import validate as validate_
from .validate_schema import validate_schema as validate_schema_
Expand Down Expand Up @@ -376,7 +378,7 @@ def jsonschema(schema, out, fiboa_version, id_):
)
@click.option(
'--compression', '-pc',
type=click.Choice(["brotli", "gzip", "lz4", "snappy", "zstd", "none"]),
type=click.Choice(COMPRESSION_METHODS),
help='Compression method for the Parquet file.',
show_default=True,
default="brotli"
Expand All @@ -385,7 +387,7 @@ def jsonschema(schema, out, fiboa_version, id_):
'--geoparquet1', '-gp1',
is_flag=True,
type=click.BOOL,
help='Enforces generating a GeoParquet 1.0 file bounding box. Defaults to GeoParquet 1.1 with bounding box.',
help='Enforces generating a GeoParquet 1.0 file. Defaults to GeoParquet 1.1 with bounding box.',
default=False
)
@click.option(
Expand All @@ -394,13 +396,20 @@ def jsonschema(schema, out, fiboa_version, id_):
help='Url of mapping file. Some converters use additional sources with mapping data.',
default=None
)
def convert(dataset, out, input, cache, source_coop, collection, compression, geoparquet1, mapping_file):
@click.option(
'--original-geometries', '-og',
is_flag=True,
type=click.BOOL,
help='Keep the source geometries as provided, i.e. this option disables that geomtries are made valid and converted to Polygons.',
default=False
)
def convert(dataset, out, input, cache, source_coop, collection, compression, geoparquet1, mapping_file, original_geometries):
"""
Converts existing field boundary datasets to fiboa.
"""
log(f"fiboa CLI {__version__} - Convert '{dataset}'\n", "success")
try:
convert_(dataset, out, input, cache, source_coop, collection, compression, geoparquet1, mapping_file)
convert_(dataset, out, input, cache, source_coop, collection, compression, geoparquet1, mapping_file, original_geometries)
except Exception as e:
log(e, "error")
sys.exit(1)
Expand Down Expand Up @@ -518,7 +527,7 @@ def rename_extension(folder, title, slug, org = "fiboa", prefix = None):
multiple=True,
help='Additional column names to include.',
show_default=True,
default=DEFAULT_COLUMNS,
default=CORE_COLUMNS,
)
@click.option(
'--exclude', '-e',
Expand All @@ -536,7 +545,7 @@ def rename_extension(folder, title, slug, org = "fiboa", prefix = None):
)
@click.option(
'--compression', '-pc',
type=click.Choice(["brotli", "gzip", "lz4", "snappy", "zstd", "none"]),
type=click.Choice(COMPRESSION_METHODS),
help='Compression method for the Parquet file.',
show_default=True,
default="brotli"
Expand All @@ -545,7 +554,7 @@ def rename_extension(folder, title, slug, org = "fiboa", prefix = None):
'--geoparquet1', '-gp1',
is_flag=True,
type=click.BOOL,
help='Enforces generating a GeoParquet 1.0 file bounding box. Defaults to GeoParquet 1.1 with bounding box.',
help='Enforces generating a GeoParquet 1.0 file. Defaults to GeoParquet 1.1 with bounding box.',
default=False
)
def merge(datasets, out, crs, include, exclude, extension, compression, geoparquet1):
Expand All @@ -564,6 +573,76 @@ def merge(datasets, out, crs, include, exclude, extension, compression, geoparqu
sys.exit(1)


## IMPROVE (add area, perimeter, and fix geometries)
@click.command()
@click.argument('input', nargs=1, type=click.Path(exists=True))
@click.option(
'--out', '-o',
type=click.Path(exists=False),
help='Path to write the GeoParquet file to. If not given, overwrites the input file.',
default=None
)
@click.option(
'--rename-column', '-r',
type=click.STRING,
callback=lambda ctx, param, value: parse_map(value),
multiple=True,
help='Renaming of columns. Provide the old name and the new name separated by an equal sign. Can be used multiple times.'
)
@click.option(
'--add-sizes', '-sz',
is_flag=True,
type=click.BOOL,
help='Computes missing sizes (area, perimeter)',
default=False
)
@click.option(
'--fix-geometries', '-g',
is_flag=True,
type=click.BOOL,
help='Tries to fix invalid geometries that are repored by the validator (uses GeoPanda\'s make_valid method internally)',
default=False
)
@click.option(
'--explode-geometries', '-e',
is_flag=True,
type=click.BOOL,
help='Converts MultiPolygons to Polygons',
default=False
)
@click.option(
'--crs',
type=click.STRING,
help='Coordinate Reference System (CRS) to use for the GeoParquet file.',
show_default=True,
default=None,
)
@click.option(
'--compression', '-pc',
type=click.Choice(COMPRESSION_METHODS),
help='Compression method for the Parquet file.',
show_default=True,
default="brotli"
)
@click.option(
'--geoparquet1', '-gp1',
is_flag=True,
type=click.BOOL,
help='Enforces generating a GeoParquet 1.0 file. Defaults to GeoParquet 1.1 with bounding box.',
default=False
)
def improve(input, out, rename_column, add_sizes, fix_geometries, explode_geometries, crs, compression, geoparquet1):
"""
"Improves" a fiboa GeoParquet file according to the given parameters.
"""
log(f"fiboa CLI {__version__} - Improve datasets\n", "success")
try:
improve_(input, out, rename_column, add_sizes, fix_geometries, explode_geometries, crs, compression, geoparquet1)
except Exception as e:
log(e, "error")
sys.exit(1)


cli.add_command(describe)
cli.add_command(validate)
cli.add_command(validate_schema)
Expand All @@ -574,6 +653,7 @@ def merge(datasets, out, crs, include, exclude, extension, compression, geoparqu
cli.add_command(converters)
cli.add_command(rename_extension)
cli.add_command(merge)
cli.add_command(improve)

if __name__ == '__main__':
cli()
11 changes: 11 additions & 0 deletions fiboa_cli/const.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,14 @@
STAC_COLLECTION_SCHEMA = "http://schemas.stacspec.org/v{version}/collection-spec/json-schema/collection.json"
GEOPARQUET_SCHEMA = "https://geoparquet.org/releases/v{version}/schema.json"
STAC_TABLE_EXTENSION = "https://stac-extensions.github.io/table/v1.2.0/schema.json"

COMPRESSION_METHODS = ["brotli", "gzip", "lz4", "snappy", "zstd", "none"]

CORE_COLUMNS = [
"id",
"geometry",
"area",
"perimeter",
"determination_datetime",
"determination_method",
]
4 changes: 3 additions & 1 deletion fiboa_cli/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ def convert(
collection = False,
compression = None,
geoparquet1 = False,
mapping_file=None,
mapping_file = None,
original_geometries = False,
):
if dataset in IGNORED_DATASET_FILES:
raise Exception(f"'{dataset}' is not a converter")
Expand All @@ -37,6 +38,7 @@ def convert(
compression = compression,
geoparquet1 = geoparquet1,
mapping_file = mapping_file,
original_geometries = original_geometries,
)

def list_all_converter_ids():
Expand Down
9 changes: 5 additions & 4 deletions fiboa_cli/convert_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def convert(
license = None,
compression = None,
geoparquet1 = False,
explode_multipolygon = False,
original_geometries = False,
index_as_id = False,
**kwargs):
"""
Expand Down Expand Up @@ -160,11 +160,12 @@ def convert(
else:
log(f"Column '{key}' not found in dataset, skipping migration", "warning")

# 4b. For geometry column, convert multipolygon type to polygon
if explode_multipolygon:
# 4b. For geometry column, fix geometries
if not original_geometries:
gdf.geometry = gdf.geometry.make_valid()
gdf = gdf.explode()

if has_migration or has_col_migrations or has_col_filters or has_col_additions or explode_multipolygon:
if has_migration or has_col_migrations or has_col_filters or has_col_additions:
log("GeoDataFrame after migrations and filters:")
print(gdf.head())

Expand Down
1 change: 0 additions & 1 deletion fiboa_cli/datasets/be_wa.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,6 @@ def file_migration(data, path, uri, layer):
license=LICENSE,
layer_filter=lambda layer, uri: layer == LAYER,
file_migration=file_migration,
explode_multipolygon=True,
index_as_id=True,
**kwargs
)
1 change: 0 additions & 1 deletion fiboa_cli/datasets/ch.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,6 @@ def convert(output_file, cache = None, **kwargs):
column_migrations=COLUMN_MIGRATIONS,
column_filters=COLUMN_FILTERS,
providers=PROVIDERS,
explode_multipolygon=True,
index_as_id=True,
fid_as_index=True,
**kwargs
Expand Down
1 change: 0 additions & 1 deletion fiboa_cli/datasets/ec_fr.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,5 @@ def convert(output_file, cache = None, **kwargs):
column_filters=base.COLUMN_FILTERS,
attribution=base.ATTRIBUTION,
license=LICENSE,
explode_multipolygon=True,
**kwargs
)
2 changes: 0 additions & 2 deletions fiboa_cli/datasets/es_cat.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@

COLUMN_MIGRATIONS = {
"campanya": lambda col: pd.to_datetime(col, format='%Y'),
"geometry": lambda col: col.make_valid(),
}

MISSING_SCHEMAS = {
Expand All @@ -62,6 +61,5 @@ def convert(output_file, cache = None, **kwargs):
license=LICENSE,
layer="CULTIUS_DUN2023",
index_as_id=True,
explode_multipolygon=True,
**kwargs
)
2 changes: 0 additions & 2 deletions fiboa_cli/datasets/fi.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,6 @@
COLUMN_MIGRATIONS = {
# Make year (1st january) from column "VUOSI"
"VUOSI": lambda col: pd.to_datetime(col, format='%Y'),
# Todo: generate a generic solution for making geometries valid
"geometry": lambda col: col.make_valid()
}

def migrate(gdf):
Expand Down
Loading