Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPT error: mistaken occurrenceID duplicates or missing IDs #2654

Open
dbloom opened this issue Feb 15, 2025 · 6 comments
Open

IPT error: mistaken occurrenceID duplicates or missing IDs #2654

dbloom opened this issue Feb 15, 2025 · 6 comments
Assignees

Comments

@dbloom
Copy link

dbloom commented Feb 15, 2025

@mike-podolskiy90 et al,

I have been attempting to re-publish an existing resource, https://www.gbif.org/dataset/a267b6a7-91f9-457c-889a-481e7aa920b6, with an updated dataset. The result in every attempt is a publication error (below, with associated log).

The IPT is telling me there are 1156 missing occurrenceIDs and 49 records with duplicate IDs. I know this to be false. I have run the dataset through the GBIF data validator and there are no issues. I have run the set through OpenRefine and there are no blank fields or duplicate IDs. I have scrubbed the file of invisible characters and deleted trailing rows. I've attempted to upload as xls and csv, both compressed and uncompressed.

I could use some recommendations. Also, if this sort of inquiry should go to the Helpdesk I can direct it there. Just let me know to stop bothering you.

(PS the full set of admin logs has revealed to me that there are a raft of similar issues with every Arctos resource, BUT (1) the resource described above is NOT an Arctos resource and (2) it appears that Arctos has made some mapping changes that I am pursuing separately because it comes from Arctos and not the IPT. Thus, if you review the admin logs on this IPT they are U G L Y, but I can solve those other issues. You can search the full Admin Logs for "sio_benthicinverts". All of my attempts have been made on or after 10-Feb-2025.)

Error

"Publishing version #1.5 of resource sio_benthicinverts failed: Archive generation for resource sio_benthicinverts failed: Can't validate DwC-A for resource sio_benthicinverts. Each line must have a occurrenceID, and each occurrenceID must be unique (please note comparisons are case insensitive)"

Log:

00:35:32 1156 line(s) missing occurrenceID
00:35:32 49 line(s) having a duplicate occurrenceID (please note comparisons are case insensitive)
00:35:32 Archive validation failed, because not every line has a unique occurrenceID (please note comparisons are case insensitive)
00:35:33 Restored version #1.4 of resource sio_benthicinverts after publishing failure

@mike-podolskiy90
Copy link
Contributor

Thank you for reporting the issue.
This looks strange indeed - checking occurrenceId uniqueness is usually reliable. I will have a look first and let you know

@mike-podolskiy90 mike-podolskiy90 self-assigned this Feb 15, 2025
@dbloom
Copy link
Author

dbloom commented Feb 15, 2025

Thanks @mike-podolskiy90 Please take your weekend first.

@mike-podolskiy90
Copy link
Contributor

mike-podolskiy90 commented Feb 15, 2025

No worries! Just had a look. I guess you validated the last DwC-A which is fine. I checked the source file instead and can tell that something is wrong with it. Starting with the line A10007 I see it's shifted. Please make sure it's fixed

@dbloom
Copy link
Author

dbloom commented Feb 15, 2025

Ah, perfect (well, not perfect). I will get on that. Thank you much @mike-podolskiy90

@mike-podolskiy90
Copy link
Contributor

Let me know if we can close it. Thanks!

@dbloom
Copy link
Author

dbloom commented Feb 19, 2025

Go ahead. Still awaiting the return of the file with corrections from the publisher, but if I have the same error again I'll revive this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants