Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catalogNumber vs materialEntityID #211

Open
weevil-see opened this issue Jan 29, 2025 · 3 comments
Open

catalogNumber vs materialEntityID #211

weevil-see opened this issue Jan 29, 2025 · 3 comments
Assignees
Labels
new other types of data - extensions term - MaterialEntity Pertaining to a term organized in the MaterialEntity class. term - MaterialSample term - Occurrence Pertaining to a term organized in the Occurrence class.

Comments

@weevil-see
Copy link

I have a question related to the digitization of entomological collections. A specimen could have several numbers or unique identifiers, and there are several Darwin Core Terms which could be used to describe them.
Historical catalog numbers are usually not globally unique, and sometimes multiple specimen can have the same number in a historical catalog, if they came from the same locality. I would put such a historical number into dwc:otherCatalogNumbers. However, which of the following would be used for the number which is currently being used, most likely a globally unique identifier for the specimen?
Here are my options as far as I know:

I also have some general confusion: A preserved specimen seems to fit the definition of several classes: dwc:Occurrence, dwc:MaterialEntity, dwc:MaterialSample and, dwc:PreservedSpecimen. Are classes the same as extensions? In Simple Darwin Core, I would not use class terms as fields, but instead use dwc:basisOfRecord with one of the class terms as value.
But how do I use classes in general? In DwC-A, would I use the class term as a field in the core file, with a unique database key to link to an extension file? In the documentation, I did not find a good explanation for the use of classes.

@debpaul
Copy link
Contributor

debpaul commented Jan 29, 2025

@weevil-see great questions! @tucotuco @pzermoglio will likely jump in.

First, to add to this, I would say you can also use the Resource Relationship extension to be able to share multiple identifiers for a given object. You can imagine specimens having (for example): an identifier given by the collector (dwc:recordNumber), an identifier given by the collection holding the specimen (dwc:catalogNumber), and then often a dwc:occurrenceID (globally unique, hopefully). All of these numbers provide valuable provenance breadcrumbs. The dwc:recordNumber is probably in your field notes. The dwc:catalogNumber is in the collection database and probably on the specimen itself (or in the jar, or on the tag, etc). And of course the dwc:occurrenceID is needed for ensuring the database record being submitted/shared/published is unique, since the other terms mentioned here won't necessarily manage that uniqueness hurdle. And, any of those might be found / cited in an online publication.

Second, I think at some point in the (nearish)? future, there will be changes regarding dwc:basisOfRecord. At the same time, then your confusion about dwc:Occurrence/MaterialEntity/MaterialSample will hopefully be clarified. In my understanding, Occurrence is broad on purpose so that it can encompass multiple types of records such as: observation records, machine records, and collecting records that point to a physical voucher. Others can tell you more about MaterialEntity and MaterialSample than I can. It looks like (from reading the definitions) that Material Sample is subsumed under Material Entity.

Third, the classes are a convenience. They are not "extensions". You can see some of the current dwc extensions (groups or bags of terms added to meet the needs of particular groups who want to share biodiversity data). (Look for extensions that have "tdwg" in their namespace). Examples

Extensions also make it possible share structured one-to-many data (as in multiple images for the same specimen, or multiple identifiers for a given object, etc).

Fourth, what is the format / structure of a DwC-A? I'd start with the
Darwin Core Archives – How-to Guide and scroll to the bottom where you can see exemplar datasets.

Fifth, I and others can point you to other examples too, if you want them. I think you'll then see how the data are structured.

I realize I didn't answer all your questions. I hope I got you started and others will step in.

@debpaul debpaul self-assigned this Jan 29, 2025
@debpaul debpaul added new term - Occurrence Pertaining to a term organized in the Occurrence class. other types of data - extensions term - MaterialSample term - MaterialEntity Pertaining to a term organized in the MaterialEntity class. labels Jan 29, 2025
@ben-norton
Copy link
Member

ben-norton commented Jan 29, 2025

Great question @weevil-see
Adding to @debpaul comments
A class is a category of terms. The concept is based on the class-property structure of the RDF Scheme. In simpler terms, classes contain terms and terms contain values. In a relational database, classes are generally equivalent to tables and terms are columns. You populate columns with values, not tables.
dwc:catalogNumber is the identifier assigned to a specimen within your institution. This is normally called the catalog number. The term is associated to the original collection ledgers or 'catalogs' from which the data is derived.
In regards to your initial question, I would take a functional approach. What's the purpose of "the number which is currently being used,"? Although not explicitly stated, occurrenceIDs must be GUIDs. Catalog numbers do not. They are often just numeric or combinations of alphabetic and numeric characters. There is no expectation that a catalog number is unique (even within the bounds of a single collection). occurrenceIDs must be globally unique and must be persistent, meaning you can't change them once
assigned. Here, the format of the catalog number is not nearly as important as its function or intended use.

The difference between catalogNumber and materialEntityID is a matter of scope. In general, the scope of catalogNumber is much tighter than materialEntityID. The use of the former is generally limited to digitized collections. The latter is any entity that can be identified, exists for some period of time, and consists in whole or in part of physical matter while it exists.

In your case, I would recommend using catalogNumber. In your case, the catalog number happens to be a UUJID, but that's coincidental.

Also, entomology collection specimens are often cataloged in taxonomic lots where the terms individualCount and organismQuantity are especially applicable.

@tucotuco
Copy link
Member

In case your questions aren't fully answered yet...

I have a question related to the digitization of entomological collections. A specimen could have several numbers or unique identifiers, and there are several Darwin Core Terms which could be used to describe them. Historical catalog numbers are usually not globally unique, and sometimes multiple specimen can have the same number in a historical catalog, if they came from the same locality. I would put such a historical number into dwc:otherCatalogNumbers. However, which of the following would be used for the number which is currently being used, most likely a globally unique identifier for the specimen? Here are my options as far as I know:

dwc:catalogNumber for sure. If it is truly a globally unique identifier, it could also be the dwc:materialEntityID. It definitely should not be the dwc:occurrenceID. If it is current, I would not put it in dwc:otherCatalogNumbers.

I also have some general confusion: A preserved specimen seems to fit the definition of several classes: dwc:Occurrence, dwc:MaterialEntity, dwc:MaterialSample and, dwc:PreservedSpecimen. Are classes the same as extensions? In Simple Darwin Core, I would not use class terms as fields, but instead use dwc:basisOfRecord with one of the class terms as value. But how do I use classes in general? In DwC-A, would I use the class term as a field in the core file, with a unique database key to link to an extension file? In the documentation, I did not find a good explanation for the use of classes.

No, classes are not the same as extensions. Classes are concepts in which property terms are organized (for example, terms about an Event would be organized in the Event class). Extensions are a way to add extra information to records, and might consist of terms from multiple classes, but extensions are not classes.

Not just in Simple Darwin Core, you would never use class terms as fields. You would only use property terms as fields. As you pointed out, you can use class names or labels as values for properties (for example, "PreservedSpecimen" as a value for dwc:basisOfRecord).

In general you don't really use classes except to define the concepts the properties refer to.

Please let us know if you still have questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new other types of data - extensions term - MaterialEntity Pertaining to a term organized in the MaterialEntity class. term - MaterialSample term - Occurrence Pertaining to a term organized in the Occurrence class.
Projects
None yet
Development

No branches or pull requests

4 participants