Pipelines data validator integration #1635

fmendezh · 2021-08-24T14:03:07Z

Integrating the IPT and Data Validator can help publishers to improve data before publishing it into GBIF, the data validator provides a consistent API with the running data ingestion platform, such API provides the necessary services to validate Occurrence, Checklist, and Metadata only datasets.

Basic functionality

Once a dataset/resource contains the desired metadata and its data has been uploaded or mapped, the user desires to validate it before publishing it to GBIF.
The IPT generates a DwC-A in a staging location accessible as an external URL and through the Data Validator API requests to validate it.
- This can also be accomplished by using the Validator API to upload a file.
- The authentication method must respect the implemented procedures for the IPT.
The Data Validator starts the validation process, returns the validation key for the requested archive, which will be used to track its progress.
Upon successful validation, the IPT should allow the user to publish the resource into GBIF.

Additional considerations

The IPT must provide a way to track the validation progress of an individual resource.
Multiple validation requests for the same resource must be prevented to happen by allowing only one validation running at a time per resource. The Data Validator, already imposes a suggested maximum validation a single user can run in parallel.
Once validation has finished the IPT must delete all temporary files and elements created.
For the IPT shouldn't be necessary to store other information than the validation identifiers executed for each resource, a specific endpoint for IPT validation can also be considered to relieve the IPT of storing additional data.

…g client #1635

spalp · 2024-08-30T10:09:53Z

Wow, thanks to @ckotwn, I just became aware of this incredibly useful feature. Cannot wait to see it in production.
Meanwhile, I added a step for the publisher in the documentation suggesting them to manually check their data using the IPT. Here's the commit: master...spalp:ipt:patch-2 I hope it makes sense.

fmendezh assigned mike-podolskiy90 Aug 24, 2021

mike-podolskiy90 mentioned this issue Aug 25, 2021

New data validator integration #1620

Closed

timrobertson100 mentioned this issue Aug 30, 2021

TG2 - understanding the status and process of developing tests and assertions tdwg/bdq#192

Open

fmendezh added a commit that referenced this issue Sep 14, 2021

Initial proposal to integrate IPT and pipelines validator thru a Fein…

a9c966c

…g client #1635

mike-podolskiy90 added this to the 3.1.x milestone Sep 3, 2024

mike-podolskiy90 added the Priority-High label Oct 4, 2024

mike-podolskiy90 modified the milestones: 3.1.x, 3.2 Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipelines data validator integration #1635

Pipelines data validator integration #1635

fmendezh commented Aug 24, 2021

spalp commented Aug 30, 2024

Pipelines data validator integration #1635

Pipelines data validator integration #1635

Comments

fmendezh commented Aug 24, 2021

Basic functionality

Additional considerations

spalp commented Aug 30, 2024