Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name duplication #69

Open
Teinostoma opened this issue May 30, 2024 · 0 comments
Open

Name duplication #69

Teinostoma opened this issue May 30, 2024 · 0 comments

Comments

@Teinostoma
Copy link

BHL is listing several names two or three times. For example, https://www.biodiversitylibrary.org/item/98172#page/97/mode/1up has Scapharca, Scapharca Gray, 1847, and Scapharca J. E. Gray, 1847; Anadara is listed by itself and as J. E. Gray, 1847.

There are also the usual issues of badly inadequate OCR, the challenge of distinguishing between words used that are homonyms of taxonomic names and actual scientific names (e.g., Florida and Alligator are geographic terms in the text, not taxa), not recognizing most of the species names that actually are on the page, and claiming a species is present that isn't in the text in any form. The latter seems to reflect an OCR error misreading a word in the text as matching a common specific epithet and the program somehow picking a genus to go with it. It might help some to tell the program not to consider any taxon described later than the date of publication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant