-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DwC field scientificNameID is not used at all #217
Comments
Above link is using UAT (User acceptance testing). The result might be the same, but test environment and production environment is likely to be different much of the time. I suggest that you use the production site https://www.gbif.org instead |
(UAT and the production environment are usually very similar for record interpretation.) @mdoering can help explain what's happening here. I can see we have the name from the WoRMS checklist: https://www.gbif.org/species/105760798 but it isn't linked in to our taxonomic backbone. This one from a different checklist is, but it loses the author. We don't yet match using the scientificNameId, we should look to support this for the ingestion pipeline rewrite (in progress this year). Adding a kingdom will make this match to the correct name. |
yes, scientificNameId is pretty much ignored in both occurrence and checklist processing. |
Most of the links on this issue are deprecated. |
This is not about dead links, but about the fileld dwc:scientificNameID being ignored by GBIF |
Agree this is an important issue, especially for OBIS node contributions. This really is a missed opportunity for GBIF as OBIS nodes take great care in assigning an appropriate scientificNameID to each occurrence. Would hate to see any records from the OBIS-USA node end up as terrestrial species when we've taken the time to provide the marine representation. |
In that case, should it be an issue for the CoL+? https://github.com/Sp2000/colplus |
I am wondering about a few things here:
Looking at one of the Oligochaeta Koch examples I see the taxonomic dwc occurrence information is very sparse: https://www.gbif.org/occurrence/1324564024 http://lsid.info/urn:lsid:marinespecies.org:taxname:2036
|
The point of (dwc) archives is that it is NOT linked data. But if we had a (WoRMS) checklist that defined those IDs we could cross reference them so the taxonomic information would not have to be repeated in the occurrences. |
To some degree yes, but it is primarily an Occurrence interpretation issue |
To answer your questions @mdoering
You have a WoRMS checklist that defines those: https://www.gbif.org/dataset/2d59e5db-57ad-41ff-97d6-11f5fb264527 |
I think referring to a known checklist like WoRMS and reusing their taxonIDs makes a lot of sense and GBIF should support that in the long run. @timrobertson100 maybe the pipelines project can be a good way to include such a taxonID lookup. Still there are many detail questions, I have a few popping up immediately:
|
Thanks @bart-v @albenson-usgs Currently I will move this issue into the gbif pipelines project, where we'll implement it working through the issues @mdoering rasies. All effort right now is on making the new ingestion pipeline live. |
For current links, Edited to add: There are a few obscure records where this doesn't doesn't hold true, but they are rare |
@mdoering about finding out what checklist (version) has been used, everything is solved by using a proper and persistent GUID (like LSID): it tells you what authority has been used, on a per record basis. I don't understand this question
If it's a GUID, there is only one single checklist who has assigned/generated this GUID, so there is nothing to choose from? |
Thanks @timrobertson100 |
@bart-v a properly versioned LSID would tell you what it was when resolving it. But I doubt a DwC WoRMS archive contains all historical versions of a name or deleted names. My point about a non unique GUID is that there might be various datasets, e.g. molluscabase, WoRMS, Catalogue Of Life that all use the same GUID. Knowing which is the authorative one seems trivial by looking at the domain, but I would expect we better have some metadata about that on the dataset level. I am sure GUIDs will not appear once only. |
WoRMS could do versions but that is usually overkill. We hardly ever change names, but create new ones ans point to them to each other. We do keep track of deletions. I agree that some metadata on dataset level is needed, indeed. |
There can't be a "non unique GUID". It's in the name: "Globally unique..." I don't think it matters which name list is authorative! Only that the user can see which was used. As they can, when the urn:lsid: format is used. [Note: To be fair, our The most distressing thing about this issue is that i can see the simple solution to my #934 is to remove In an case, it's wrong for GBIF to make assumptions abut my data. |
Hi folks To try and address some of the challenges I think we could make a good step forward with a fairly simple solution. What do people think about the following, please? Taking this record as an example, it comes with:
In the processing we could do the following:
This approach would use the identifier mapping to find things in the GBIF backbone which is a more robust mapping than the names-based lookup service. There will always be some inconsistency due to the publishing cycle (e.g. occurrence records with names not in the latest WoRMS dataset) but it would at least 1) improve the homonym cases, and 2) improve the cases where only IDs are provided. To get a sense of which prefixes would be suitable to map against a checklist please see this:
What do you think? Thanks |
That looks good to me. That last row returned by your query is probably all from datasets submitted to EurOBIS!
I had to find the ID in WoRMS before I published the dataset. The only time that could happen is if the record was deleted, but that would be exceedingly rare (generally, invalid taxa are flagged as invalid but ironically an invalid ID is just as valid for the purpose!). I would expect other authorities to do the same. |
Perfect @timrobertson100 ! |
This issue was moved from portal-feedback to pipelines
Example
https://www.gbif-uat.org/occurrence/search?dataset_key=740cf4e0-37ca-4389-ba8f-4e1bc5177893&taxon_key=5401803
Lists the records as Oligochaeta and appends the authority "K.Koch" just like that.
That makes these marine occurrences terrestrial plants...
While a scientificNameID urn:lsid:marinespecies.org:taxname:2036 is provided, that can be resolved to the animal class Oligochaeta.
This is a missed chance to fix homonyms in an easy way...
The text was updated successfully, but these errors were encountered: