TG2 - understanding the status and process of developing tests and assertions #192

ymgan · 2021-08-27T07:37:07Z

Hey @tucotuco

As mentioned, these are the questions from OBIS data quality task team:

What is the process the BDQ TG2 went through to develop that spreadsheet of tests and assertions?
As far as I understood, the tests and assertions are finalized - does that mean they will no longer be updated?
If we encounter a test that was missed should we inform TG2?
Can you talk to us about if/how GBIF has integrated these tests into their processes?
What is the current status of the BDQ IG? Are there active tasks groups we could join?

Thank you so much!

Tasilee · 2021-08-30T00:39:47Z

@tucotuco is likely far busier than me, and I reckon I can answer these questions.

The process that was followed was that I trawled the web looking for existing tests that were being used by agencies such as GBIF, the ALA, CRIA, iDigBio etc. These were compiled into a spreadsheet and then we classified the tests in multiple ways, such as did they deal with NAME (of species etc), SPACE (e.g., lat/long in correct country), TIME (e.g., viable ISO date/times) or OTHER (e.g., valid license for use). We then refined the tests, filtering out those that we didn't consider as CORE (=basic), those that were hard to implement (as we wanted wide implementation), continued to add to the classification that you now see on GitHub (e.g., Expected response). We filled the discovered gaps with tests, and occasionally refined existing tests. a LOT of work has gone into those that remain. Once the tests were finalized, we started on test data. All the details can be found in the paper https://biss.pensoft.net/article/50889/. We have completed the test data for most of the 'tests' and now just needing to finish a subset of the amendment 'tests'.
The tests are finalized as far as TG2 is concerned as we have been refining now for nearly 5 years. However, that is not to say that the team (Arthur, John, Paul, Paula or I) may find something we need to discuss, but it is very unlikely. Once we finalize the test data, then the tests will be submitted as a TDWG standard. Note again, these are the CORE tests. We understand others may be added for domains such as marine. Whether they become a new section of the 'standard', can't say, but it would seem a good idea to have some QA/QC - and the usual benefit of standards.
If you find a CORE test that is missing, then please inform me. It may be that we considered it and rejected it for reasons which will be documented, hopefully. If it is a genuine GOTCHA, then we would always be open to addition prior to submission.
GBIF integration: No, can't answer that one myself, but @timrobertson100 may be be able to tell you what if anything is happening. I am aware that the ALA had given a commitment for test integration and seeing that the 'back-end' of the main databases are now 'aligned', this would seem an easier task.
DQ IG: Probably need to have @ArthurChapman or @saraiva-usp fill you in on this, but the IG will always be open for new members and either of those two could let you know the status of the TGs. I would certainly expect Paula's TG4 is always open for help. We in TG2 would always welcome help on finalizing the test data. It would be fair to say that after years of effort, Arthur, John, Paul, Paula and I are a tad 'burnt out'. New blood would be nice.

@tucotuco, @ArthurChapman, @chicoreus, @pzermoglio - can comment further.

timrobertson100 · 2021-08-30T08:58:25Z

Can you talk to us about if/how GBIF has integrated these tests into their processes?

Thanks, @ymgan. I think the GBIF processing covers what's behind these tests, for the most part, flagging records accordingly but it isn't a strict implementation. There may be some slight differences in the rules, likely arising from the fact GBIF deals with data in a variety of formats and due to long-term API stability. The GBIF validations and enrichments are done in the gbif/pipelines project which powers the GBIF and ALA ingestion and the GBIF validator which will shortly be integrated into the GBIF IPT.

tucotuco · 2021-08-30T23:02:04Z

Thanks to both Tim and Lee, who, whether busier or not, answered better than I could anyway.

…

On Mon, Aug 30, 2021 at 5:58 AM Tim Robertson ***@***.***> wrote: Can you talk to us about if/how GBIF has integrated these tests into their processes? Thanks, @ymgan <https://github.com/ymgan>. I think the GBIF processing covers what's behind these tests, for the most part, flagging records accordingly but it isn't a strict implementation. There may be some slight differences in the rules, likely arising from the fact GBIF deals with data in a variety of formats and due to long-term API stability. The GBIF validations and enrichments are done in the gbif/pipelines <https://github.com/gbif/pipelines> project which powers the GBIF and ALA ingestion and the GBIF validator <https://www.gbif.org/tools/data-validator> which will shortly be [integrated into the GBIF IPT](gbif/ipt#1635 <gbif/ipt#1635>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#192 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADQ727BJYHRFT7FWHOVCMDT7NB3XANCNFSM5C467YNQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

ymgan · 2021-10-12T09:12:59Z

Thank you so much Lee, Tim and John!! I really appreciate it!

@pieterprovoost - This is the issue that I mentioned in our previous data QC task team meeting. Let's see if we can make use existing flags developed by the task group and GBIF.

I believe GBIF's flags for the data validator is here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TG2 - understanding the status and process of developing tests and assertions #192

TG2 - understanding the status and process of developing tests and assertions #192

ymgan commented Aug 27, 2021

Tasilee commented Aug 30, 2021

timrobertson100 commented Aug 30, 2021 •

edited

Loading

tucotuco commented Aug 30, 2021 via email

ymgan commented Oct 12, 2021

TG2 - understanding the status and process of developing tests and assertions #192

TG2 - understanding the status and process of developing tests and assertions #192

Comments

ymgan commented Aug 27, 2021

Tasilee commented Aug 30, 2021

timrobertson100 commented Aug 30, 2021 • edited Loading

tucotuco commented Aug 30, 2021 via email

ymgan commented Oct 12, 2021

timrobertson100 commented Aug 30, 2021 •

edited

Loading