Taxon Concepts as a data model #1852

Jegelewicz · 2018-12-13T22:45:49Z

A thread for exploring the idea.

Related issues/comments include
#1136
#1809
#1817
#735 (comment)
#912 (comment)
#1803 (comment)
#1805 (comment)
#983 (comment)
#1609 (comment)

Jegelewicz · 2019-01-11T18:58:26Z

@dustymc Could you give us your ideas about how this type of model would look in Arctos?

dustymc · 2019-01-12T02:51:45Z

The model I envision is pretty simple - identification_taxonomy (links identifications to names/taxa, which are refined to a source by a collection's preferences) becomes identification_classification - identifications would link to specific classifications/concepts rather than names.

We'd have to preserve classification_id and maybe limit how things can change and such, but that's details.

Using that with something that can talk to Arctos is pretty simple, if perhaps a lot more labor intensive. (Think #1136, but with maybe-hundreds of options for every name, in addition to the "which name should we use?" thing that Issue's focused on.) Right now, you type "Echidna" into a pick, get one match (and maybe some species and stuff that you don't have enough information to care about, related names, etc.), select it, and the details work themselves out from collection's source preference. In a concept model, you'd type "Echidna" and get (along with all of the species-and-junk, including those in Bitis and Tachyglossus and whatever other synonyms might exists) something that looks very much like http://arctos.database.museum/name/Echidna - I think the full classification in the context of all other classifications is the minimal amount of information you'd need to pick one. Scroll around, find the thing you want, click "use this one," voilà. There are ~14 "concepts" that include Muraenidae on there at the moment, so working out which ones to pick under which situations would be left to ya'll. I'd expect that to grow rapidly (if anyone decides to really embrace this idea, anyway) - any source might include the original publication, then 231 years of publications refining circumscriptions, and publications rejecting publications that tried to refine circumscriptions, and groups of those, and groups excluding certain publications, and field guides, and all the other normal noise. (And for homonyms like Echidna, perhaps the same sort of information for viruses and moths and monotremes and such.) I think that's easily hundreds of "concepts," and some - or most - of them may be different ways of saying the same thing (https://academic.oup.com/sysbio/article/65/4/561/1753624).

Accessing that level of complexity with something that cannot talk to Arctos is a great mystery to me. Most specimen data come from spreadsheets and such - things that cannot talk to Arctos. Much/most/all of that is by people who don't even KIND OF have the resources for figure out what definition of eel whoever entered the data a few decades ago might have had in mind. That's "just" a usability problem and there's certainly a way around it. I suspect finding that pathway will involve having the right people in the same room for a few days.

Maybe it's as simple as having some sort of default concept (eg, the original description), although I'm not sure how the details of that could work. I think the vast majority of the time we don't think in taxon concepts, so some sort of "just use the name" default might be necessary anyway. ("It's a moose, obviously" - that's all we know or really care about most of the time.)

Note that this completely avoids the issue of defining taxon concepts. If you can stuff it into an optionally-ordered key-value array, or stuff some sort of summary into that structure and link to the "real" concept (what we do with WoRMS), you could use it in identifications.

DerekSikes · 2019-01-14T20:32:24Z

Maybe it's as simple as having some sort of default concept (eg, the original description), although I'm not sure how the details of that could work. Although more complicated I would think it would be better to have such a default concept be somehow closer in time to the date of the ID. If the ID was in 2010 and the latest publication to use that name in a taxonomic way (in a way that qualifies it be considered a taxon concept) is from 2005 then that's the concept that should be default - the most recent. Original descriptions, esp. old ones, are rarely used or even seen by those doing the IDs today. If one had to choose a way to be the least wrong most of the time, it would be to assume the concept used was the most recently published concept.

…

-Derek

On Fri, Jan 11, 2019 at 5:51 PM dustymc ***@***.***> wrote: The model I envision is pretty simple - identification_taxonomy (links identifications to names/taxa, which are refined to a source by a collection's preferences) becomes identification_classification - identifications would link to specific classifications/concepts rather than names. We'd have to preserve classification_id and maybe limit how things can change and such, but that's details. Using that with something that can talk to Arctos is pretty simple, if perhaps a lot more labor intensive. (Think #1136 <#1136>, but with maybe-hundreds of options for every name, in addition to the "which name should we use?" thing that Issue's focused on.) Right now, you type "Echidna" into a pick, get one match (and maybe some species and stuff that you don't have enough information to care about, related names, etc.), select it, and the details work themselves out from collection's source preference. In a concept model, you'd type "Echidna" and get (along with all of the species-and-junk, including those in Bitis and Tachyglossus and whatever other synonyms might exists) something that looks very much like http://arctos.database.museum/name/Echidna - I think the full classification in the context of all other classifications is the minimal amount of information you'd need to pick one. Scroll around, find the thing you want, click "use this one," voilà. There are ~14 "concepts" that include Muraenidae on there at the moment, so working out which ones to pick under which situations would be left to ya'll. I'd expect that to grow rapidly (if anyone decides to really embrace this idea, anyway) - any source might include the original publication, then 231 years of publications refining circumscriptions, and publications rejecting publications that tried to refine circumscriptions, and groups of those, and groups excluding certain publications, and field guides, and all the other normal noise. (And for homonyms like Echidna, perhaps the same sort of information for viruses and moths and monotremes and such.) I think that's easily hundreds of "concepts," and some - or most - of them may be different ways of saying the same thing ( https://academic.oup.com/sysbio/article/65/4/561/1753624). Accessing that level of complexity with something that cannot talk to Arctos is a great mystery to me. Most specimen data come from spreadsheets and such - things that cannot talk to Arctos. Much/most/all of that is by people who don't even KIND OF have the resources for figure out what definition of eel whoever entered the data a few decades ago might have had in mind. That's "just" a usability problem and there's certainly a way around it. I suspect finding that pathway will involve having the right people in the same room for a few days. Maybe it's as simple as having some sort of default concept (eg, the original description), although I'm not sure how the details of that could work. I think the vast majority of the time we don't think in taxon concepts, so some sort of "just use the name" default might be necessary anyway. ("It's a moose, obviously" - that's all we know or really care about most of the time.) Note that this completely avoids the issue of defining taxon concepts. If you can stuff it into an optionally-ordered key-value array, or stuff some sort of summary into that structure and link to the "real" concept (what we do with WoRMS), you could use it in identifications. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1852 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIraM1_I-lpGuyX95qJqsLQl5a43SXgDks5vCU3CgaJpZM4ZSkes> .

-- +++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960 dssikes@alaska.edu phone: 907-474-6278 FAX: 907-474-5469 University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all <http://www.uaf.edu/museum/collections/ento/> +++++++++++++++++++++++++++++++++++ Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us <http://www.akentsoc.org/contact.php>

dustymc · 2019-01-14T20:47:02Z

@DerekSikes I agree, but that involves tracking down ~3 million "most recently published concepts."

Maybe "it's just a name, we're not asserting anything" is a better default - although that might require a NULL classification or something equally weird.

I think this comes down to what ya'll are willing to do. I'm operating on the assumption that most data entry is going to involve a label/spreadsheet/whatever that just says "Somegenus somespecies" and Curators who are OK with that level of information (eg because they don't have the resources to do anything else). I'd love to require something more specific, I just don't see how we can pull it off.

campmlc · 2019-01-14T21:27:11Z

This is interesting. I like the concept if it could be implemented. I assume that this way we could have taxon concepts that would be preferred by different institutions - e.g. Myodes gapperi in a classification that prefers this version of the genus and the family Cricetidae subfamily Arvicolinae could be preferred by MSB, and Clethrionomys gapperi in a classification that prefers the genus Clethrionomys in the family Muridae would be preferred by MVZ, and Myodes as a beetle could be preferred by an insect collection If we could display the taxon concept preferences for each institution when there are more than one possible, then we could filter these so students doing data entry choose the right one. And I disagree that collections don't think in taxon concepts already - they do, just at a broader scale. Perhaps you have "*Somegenus somespecies*" , but in the context of a mammal or bird or insect collection. We should at least know phylum and class.

…

On Mon, Jan 14, 2019 at 1:47 PM dustymc ***@***.***> wrote: @DerekSikes <https://github.com/DerekSikes> I agree, but that involves tracking down ~3 million "most recently published concepts." Maybe "it's just a name, we're not asserting anything" is a better default - although that might require a NULL classification or something equally weird. I think this comes down to what ya'll are willing to do. I'm operating on the assumption that most data entry is going to involve a label/spreadsheet/whatever that just says "*Somegenus somespecies*" and Curators who are OK with that level of information (eg because they don't have the resources to do anything else). I'd love to require something more specific, I just don't see how we can pull it off. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1852 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOH0hDCW5NkmMOULS1AhisA_78ueNw-Pks5vDOzHgaJpZM4ZSkes> .

dustymc · 2019-01-14T21:53:04Z

Implemented is easy. Used - maybe not so much...

There is no "preferred" in the model. Maybe we can figure out some default or something as above, but the "core" would be explicitly picking a concept (unless someone comes up with something clever...).

I think it would be generally finer-grained than you've described. You'd have M. gapperi (limited to WHATEVER because SOME PUBLICATION or something - maybe something about range or morphology or DNA or karyotype or song or parasites or ...), and M. gapperi (potentially with exactly the same hierarchy) limited to SOMETHING ELSE because SOME OTHER PUBLICATION (and maybe hundreds of other concepts, potentially all with exactly the same hierarchy). Knowing phylum or class won't help very much - that'll get rid of Myodes-the-bug, but still (probably) won't get you to the single classification/concept you need to create an ID.

You're probably right that we do think in "fuzzy taxon concepts," and Derek is probably right in that those aren't really THAT fuzzy, it's just that we don't record anything useful so someone in the future has to guess who's idea of M. rutilus was used, and how close that concept was adhered to, when the ID was applied.

Jegelewicz · 2019-01-16T22:10:37Z

tracking down ~3 million "most recently published concepts."

Wouldn't each concept have a publication date associated with it? Even just a year? Then you just pick the date closest to today's date?

Jegelewicz · 2019-01-16T22:18:34Z

It sounds like we agree that this is the way to go, but we need to get together in a room and work out the details. I would suggest we do this at SPNHC, but I think I would rather make this a meeting about just one thing without the distractions of of other presentations and ideas. Thoughts? Who really wants to be included in the in-person meeting?

dustymc · 2019-01-16T22:25:37Z

Wouldn't each concept have a publication date associated with it?

No, that would take us into defining taxon concepts. And the "closest to today" 'concept' might be a list of publications all refuting your publications or something....

DerekSikes · 2019-01-16T22:50:40Z

To make this work we would need a massive publications database that holds all the concepts we'd want to use. If we can't import that we'd have to have users create these on the fly as they need them. I could see the following: For the majority of name uses in Arctos it will not be clear what the concept was. I could see 2 options for these cases: 1) taxonname with concept = 'unknown' 2) taxonname with concept = 'most recent publication that defines that name' with a qualifier field stating the confidence of the association, in this case = 'likely' For those who want to use taxon concepts and know the concept they're using, would opt for 3) taxonname with concept ='whateverpublication the concept came from' with a qualifier field stating the confidence of the association, in this case = 'certain' I suppose for the #1 option above the qualifier could be set to 'certain' since this would be true - the user is certain they don't know what concept was applied. thoughts? What outstanding questions remain on this?

…

-Derek

On Wed, Jan 16, 2019 at 1:41 PM Teresa Mayfield-Meyer < ***@***.***> wrote: Assigned #1852 <#1852> to @DerekSikes <https://github.com/DerekSikes>. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#1852 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIraMwNX5CymTRmkRO7nrA7hNEozk985ks5vD6p9gaJpZM4ZSkes> .

-- +++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960 dssikes@alaska.edu phone: 907-474-6278 FAX: 907-474-5469 University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all <http://www.uaf.edu/museum/collections/ento/> +++++++++++++++++++++++++++++++++++ Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us <http://www.akentsoc.org/contact.php>

dustymc · 2019-01-17T00:07:23Z

need a massive publications database

Only if you want to base your concepts off of publications. "Whatever WoRMS does with aphiaid=12345" is a perfectly valid (if mostly useless...) "concept" in the model I'm proposing, for example.

concept = 'unknown'

It wouldn't necessarily be completely unknown - the default would (hopefully) be something like we do now, eg http://arctos.database.museum/name/Sorex%20vagrans#Arctos. We're definitely not talking about Sorex vagrans (the jellyfish) because we have class and such in there, but there's not much detail either.

most recent publication

Or group of publications, or most recent not including THAT publication, or ...

confidence of the association

That's identification - you're not qualifying the taxon in any way, just how the specimen fits in to it.

user is certain they don't know what concept was applied

I don't think that's quite accurate. They're certain (or not) that it's a moose, they're just applying some waffly definition/circumscription/whatever - concept - of "moose."

outstanding questions

I think just how we use it, if nobody can find holes in the idea of replacing...

UAM@ARCTOS> desc identification_taxonomy
 Name								   Null?    Type
 ----------------------------------------------------------------- -------- --------------------------------------------
 IDENTIFICATION_ID						   NOT NULL NUMBER
 TAXON_NAME_ID							   NOT NULL NUMBER
 VARIABLE							   NOT NULL CHAR(1)

taxon_name_id (foreign key-->taxon_name.taxon_name_id) in that with a foreign key to taxon_term.classification_id (and we can figure out how to preserve that identifier and still use the hierarchical editor and have enough processors to do stuff with this and all that jazz).

mbprondzinski · 2019-01-17T14:37:44Z

https://www.loc.gov/standards/sourcelist/index.html
Is this of any help? I doubt I have anything to offer.

Jegelewicz · 2019-01-17T23:21:53Z

HMMMMMMM...this is interesting. Perhaps we should have "arctos" in https://www.loc.gov/standards/sourcelist/taxonomic.html

How many of these could provide WoRMS-like data to our taxonomy?

dustymc · 2019-01-18T04:28:12Z

How many of these could provide WoRMS-like data to our taxonomy?

One purpose of GlobalNames is to make the answer, "who cares?" With GN, we write to one API and get everything they have. Without that abstraction, we'd need to write to 182ish (https://resolver.globalnames.org/data_sources) APIs. There's no obvious standardization among those sources - I doubt much code could be reused. Re-creating what GN provides would require a tremendous amount of resources.

For whatever reason GN doesn't contain all of the information from WoRMS and isn't updated very often, we couldn't get them (or WoRMS, or something) to fix that, there's some additional complexity (we need specific classifications in some cases), so we wrote code to WoRMS. We can do that for other sources too if there's some compelling reason (eg, someone's going to use it to catalog), but for most things I think the best path is to encourage the source and GN to deal with it.

dustymc · 2019-01-18T16:54:03Z

Perhaps we should have "arctos" in https://www.loc.gov/standards/sourcelist/taxonomic.html

I have always seen Arctos more as a consumer than an authority, even though that's not entirely where we've found ourselves. I would like to see more WoRMS-like connections, and less local editing/"authority building." Ideally Curators (and/or their representatives) who want to could put their taxonomist hat on, log in to something like WoRMS, make changes there, and see them magically appear in Arctos. Curators who don't want to wear that hat could just passively use data from some source, or if we go to some more concept-like model from any source by picking individual "classifications" from the local cache. Even more ideally, GN would become more active in keeping things complete and current and we'd just maintain one API to do that. I don't know how realistic any of that is, but I do think it's the best model for everyone involved.

Jegelewicz · 2019-01-29T16:11:01Z

for review and discussion: http://ubio.org/ @dustymc promising?

dustymc · 2019-01-29T16:21:22Z

uBio initially intended to implement the Ballew thesaurus, and we spent a lot of time talking to them before they started writing code. If they'd have done what they set out to do, we'd likely just use them. (Or they'd have killed us all by sucking up every electron on the planet trying to build a transitive closure table....)

What they actually implemented is a "curated view" much like ITIS and everyone else, which is not very useful to Arctos as a whole. It could be useful to individual collections. I don't think there's anything that could remotely be described as taxon concepts in the data they have, but I haven't looked closely in a long time and I could be missing something.

Jegelewicz · 2019-01-29T17:14:17Z

Check out the search for Diplura in their Nomenclator Zoologicus: http://ubio.org/NZ/search.php?search=Diplura&authority=&category=&publication=&year=&comments=&exact=1&advanced=1&vol=&page=

dustymc · 2019-03-20T23:49:32Z

thoughts on usability

keep collection source preference
add optional metadata taxon_term taxon_concept_id

Default action from the specimen bulkloader (and other string-based tools) would be to use the classification/concept within the collection's preferred source which does not have a taxon_concept_id. (And nothing does now, so this would essentially be no changes that a user would notice.)

The specimen bulkloader could be modified to somehow accept taxon_concept_id - eg, Some species (someID) would use the concept under Some species with an ID of someID, and error if that doesn't exist or if there are multiple matches.

taxon_concept_id would be a string so "Conus Linnaeus, 1758" or "http://www.marinespecies.org/aphia.php?p=taxdetails&id=137813" would be acceptable values.

There is not a 1:1 name-concept relationship, so a unique key would be difficult or impossible - it would likely be possible to create many concepts "named" Conus Linnaeus, 1758 under one taxon, in which case references using the name+ID would be ambiguous so would error. I think this requires careful users and good documentation.

For internal forms, we can pass around data objects and none of the above is a concern. Would anyone want to load specimens with concepts? (Seems inevitable...)

It would be possible to link identifications to concepts outside the collection's preferred source - it would be possible to eg, use a concept under WoRMS (via Arctos) (or anything else) for a collection which generally prefers the Arctos Plants (or whatever) source.

Perhaps concepts should even be managed in their own source(s), which would keep "normal" sources cleaner/easier to manage. E.g., that could be exploited to disallow someone adding taxon_concept_id to a "default" classification.

Used concepts cannot change; "changes" create new concepts.

It would be exceptionally useful to have a consistent "backbone" in support of search. This doesn't have to be a Source (or sources - perhaps it's easier to manage at phylum/kingdom/etc.) that anyone uses, it would just facilitate search to compensate for (perhaps purposefully) inconsistent data with "concepts." This could be managed as a hierarchy and excluded from the single-record editor.

Concepts may be purposefully inconsistent and will be managed singly; it's likely safe to proceed with change requests to the single-record editor.

Example: http://dx.doi.org/10.1093/zoolinnean/zlx040 split a taxon/created a new concept and could serve as a most-basic test case.

Jegelewicz · 2019-04-14T20:49:19Z

I assume that this way we could have taxon concepts that would be preferred by different institutions - e.g. Myodes gapperi in a classification that prefers this version of the genus and the family Cricetidae subfamily Arvicolinae could be preferred by MSB, and Clethrionomys gapperi in a classification that prefers the genus Clethrionomys in the family Muridae would be preferred by MVZ, and Myodes as a beetle could be preferred by an insect collection If we could display the taxon concept preferences for each institution when there are more than one possible, then we could filter these so students doing data entry choose the right one.

This and other comments keep bringing me back to the "preferred by" solution. Perhaps the easiest thing would be that any time there is more than one classification in a source, collections using that source are notified and can select the classification they wish to use for all identifications in their collection. We could report this in the Low Quality Data section where collections could find a list of all the taxa for which there are more than one classification but they have not selected a preference. This would take the decision about taxonomy out of the hands of students entering data.

I think we would want to start off making whatever classification a collection is using right now their preference, so that when Derek comes along and starts adding classifications no one will suddenly need to choose preferred classifications for 100's of names. Otherwise, collections who don't choose a preference will end up with what we have now - a mash-up of all classifications associated with the name. This means their stuff will be found (although sometimes in error).

The main challenge I see to this is how do we record preference and ensure that someone can't go around changing the preferences for someone else's collection? I don't think throwing it into the Classification Metadata would work, but maybe Dusty could fix it so that I can only add/delete preference for collections to which I have access.

Also, there will be the need to track changes in the preferred classification. Somehow, when I decide to change from one preference to another, I would like to create a notation in the identification section of the specimen record.

dustymc · 2019-04-15T14:34:48Z

select the classification they wish to use for all identifications in their collection

If that's the only objective, the model we are currently in accomplishes it much better than taxon concepts could. If you're trying to be more precise than taxonomy allows in identifications, you might NEED taxon concepts. If you're trying to sort beetles from mice, you're almost certainly going to absolutely hate the extra workload and I never see the benefits.

This would take the decision about taxonomy out of the hands of students entering data.

This also seems to suggest that we don't need a taxon concept model. What's the point if the people who do most of the work can't access the complexity??

Also, there will be the need to track changes in the preferred classification. Somehow, when I decide to change from one preference to another, I would like to create a notation in the identification section of the specimen record.

I don't think taxon concepts can change and remain anything recognizable as a taxon concept. I think we have to find some sort of "default" or "preferred" to be able to use this, but that's just help in selecting a concept when you have no preference - it's procedure, not data. Changing the preference could not change existing data.

dustymc · 2019-04-16T16:48:45Z

There's an attempt at a summary here: https://docs.google.com/document/d/1tR7FOYd_XCLl4o_YrKdUdkwmZu3Fun3dv0xlawyW1zA/edit?usp=sharing

Jegelewicz · 2019-04-16T19:31:39Z

This also seems to suggest that we don't need a taxon concept model. What's the point if the people who do most of the work can't access the complexity??

Exactly.

What I am suggesting is that we keep the model we have. This way, data entry is easy - students (or anyone entering data) just have to pick the name.

The change I suggest is that as long as only one classification exists in any taxonomy "source", everyone uses it. Let's take Diplura, which in GBIF includes:

If all of these classifications were included in source Arctos, then anything identified as Diplura would get a crazy mash-up of higher taxa. I propose that COLLECTIONS should be able to tag one of these as preferred so that only that classification will be applied to Diplura in the associated collection.

Again, this will not solve Derek's issue and isn't really using "taxon concepts" BUT it does allow us to maintain homonyms in a single taxonomy source. I see it as a baby step.

dustymc · 2019-04-16T21:59:12Z

If you're suggesting that CollectionA can pick Diplura-the-spider for one specimen and Diplura-the-butterfly for another, then the only way I see to do that is #1852 (comment) - move the pointer from taxonomy to classifications. It's a taxon concept model, even if the concepts are flaky. (And no model excludes the possibility of flaky data.)

If you're suggesting CollectionA has to preemptively go say "All our Diplura are spiders" then this just looks like a really complicated way to split classification sources. It also obfuscates the path between names and "my" classification, unless the "this is mine" bit lives in the classification itself or something.

I don't think I'm quite understanding something.

DerekSikes · 2019-04-16T22:31:22Z

move the pointer from taxonomy to classifications. yes, but of course the lowest name in the classification is the name in taxonomy (ie merge the two).

…

-Derek

On Tue, Apr 16, 2019 at 1:59 PM dustymc ***@***.***> wrote: If you're suggesting that CollectionA can pick Diplura-the-spider for one specimen and Diplura-the-butterfly for another, then the only way I see to do that is #1852 (comment) <#1852 (comment)> - move the pointer from taxonomy to classifications. It's a taxon concept model, even if the concepts are flaky. (And no model excludes the possibility of flaky data.) If you're suggesting CollectionA has to preemptively go say "All our Diplura are spiders" then this just looks like a really complicated way to split classification sources. It also obfuscates the path between names and "my" classification, unless the "this is mine" bit lives in the classification itself or something. I don't think I'm quite understanding something. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1852 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIraM97tKaA3vAYxW5j8M3GGNb4vE_S3ks5vhkexgaJpZM4ZSkes> .

-- +++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960 dssikes@alaska.edu phone: 907-474-6278 FAX: 907-474-5469 University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all <http://www.uaf.edu/museum/collections/ento/> +++++++++++++++++++++++++++++++++++ Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us <http://www.akentsoc.org/contact.php>

dustymc · 2019-04-16T22:53:53Z

the lowest name in the classification is the name in taxonomy

That's not a requirement, although I can't think of a reason it shouldn't be true.

merge the two

Explain please.

DerekSikes · 2019-04-16T22:59:17Z

merge as in combine rather than replace. Eliminate redundancy, if it exists, combine all elements unique to either into one table classification table.

…

-Derek

On Tue, Apr 16, 2019 at 2:53 PM dustymc ***@***.***> wrote: the lowest name in the classification is the name in taxonomy That's not a requirement, although I can't think of a reason it shouldn't be true. merge the two Explain please. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1852 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIraM8LJpCG-J8ZUxZ8dLiN_Epq-Z1Nmks5vhlSCgaJpZM4ZSkes> .

-- +++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960 dssikes@alaska.edu phone: 907-474-6278 FAX: 907-474-5469 University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all <http://www.uaf.edu/museum/collections/ento/> +++++++++++++++++++++++++++++++++++ Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us <http://www.akentsoc.org/contact.php>

dustymc · 2019-04-16T23:18:08Z

So basically taxon concepts (identifications<-->classifications), but without the authoritative "anchor" tieing related stuff together?

And I found a classification (1067358 of them, actually...) that doesn't end with the name: http://arctos.database.museum/name/Poecilophis#ArctosRelationships

sharpphyl · 2019-04-17T12:58:27Z

Could we have the option of searching on (or entering) the display name or the name string which include the author instead of just the taxon name?

display_name: Poecilophis Kaup, 1856
display_name: Echidna Forster, 1788

Or add to the taxon status "hemihomonym" and "homonym" and for those allow the addition of author to differentiate? Wouldn't deal with everything but would take care of most of what I see.

dustymc · 2019-04-17T14:49:34Z

searching on

Yea, if it's in a classification somewhere I can search it.

entering

You're going to have to be a LOT more specific before I can answer that.

taxon status "hemihomonym" and "homonym"

I don't have any objections, but that sounds like a lot of work to make redundant data.

allow the addition of author

If you mean to the namestring, #1803 with a built-in extra-randomizer does not sounds like fun to me.

take care of

PLEASE, elaborate. What is it that we're trying to take care of?

Jegelewicz · 2019-04-17T16:11:59Z

If you're suggesting that Collection A can pick Diplura-the-spider for one specimen and Diplura-the-butterfly for another

Nope - suggesting collection A picks one or the other and sticks with it.

If you're suggesting Collection A has to preemptively go say "All our Diplura are spiders" then this just looks like a really complicated way to split classification sources. It also obfuscates the path between names and "my" classification, unless the "this is mine" bit lives in the classification itself or something.

I am trying to avoid very single collection having it's own source, which defeats the purpose of a collaborative system. And yes, I assumed the "this is mine" thing would live in the classification metadata. I thought I knew what we were doing with "taxon concepts" but every time I think I know, someone says something that makes me think it will not work. I have been looking for something to help us deal with multiple classifications related to a single name when they occur. We have talked in circles about this for over a year and are getting nowhere - perhaps we need some fresh viewpoints.

Jegelewicz · 2019-04-17T21:21:25Z

@Jegelewicz @campmlc to pick a collection and work with Dusty to create a new taxonomy source that pulls from Arctos.

dustymc · 2019-04-18T14:58:24Z

Avoiding Taxon Concepts

There are significant usability issues surrounding taxon concepts - it's just more complex data, so it's more difficult to use in most every way. The model generally seems like overkill for the kinds of problems we're trying to solve.

Most of those problems involve homonyms, and there is a reluctance to split classifications in order to share data/updates.

Potential not-concepts solution: create "dynamic" sources which are based on collection-defined criteria and auto-refresh themselves periodically. Selection could cross sources, include things like taxon_status or various ranks, etc. Data would be managed in the shared (eg, "Arctos") Source(s) and the dynamic source would be refreshed from updates.

Outstanding questions and concerns:

I still don't have an example of an actual problem. I can think of two potentials:

Cataloging two different type specimens which share a name in the same collection. That seems exceedingly remote, and all other homonyms can (in theory...) be dealt with by following the Codes.
Cataloging hemihomonyms in the same collection. This seems more likely (e.g., should someone catalog 'stuff found in bird nests' at sufficient detail), but I don't think we have any collections which might actually do that.

https://arctos.database.museum/name/Diplura comes up from time to time, but at least the "Arctos" data are likely just wrong - surely the term isn't actually both a class and order for the same individuals?

#1936 (and similar) - we are aggressively pushing things that are very likely to cause problems into shared classifications. I don't think there's anything to share between eg, taxa used by a bird collection and taxa used by a nautiloid collection; those taxa are created by very different user groups, and are probably best managed by different user groups. There is much more to share between eg a modern mammal collection and a paleo collection cataloging lots of Pleistocene material. I'm not sure where to draw any lines, but I suspect there are some in there and we're pushing them in directions that create unnecessary work.

Would anyone use taxon concepts as a way to disambiguate taxa (not names) in a way that can't be accomplished with "ID sensu"? If so, perhaps we should figure out how to mitigate the usability issues. If not, perhaps an alternative approach makes sense until we're forced into the more complex/precise model.

Jegelewicz · 2019-09-10T16:12:03Z

I work with the U Alaska herbarium, and Steffi Ickert-Bond and I got an
NSF grant to work on Taxon Concepts for Alaska plants (see
http://alaskaflora.org/). Included in the grant are some funds to offer
to Dusty to implement a taxon concept data model in Arctos; we’ll be
generating Taxon Concept data, and would like to be able to feed it back
into Arctos.

We talked extensively with Dusty about this in mid-2017, and feel now is
the time to start spending that money, if he, and the larger Arctos
community, are willing. I’ve just emailed Dusty and hope to chat with
him soon, but I thought it would be good to contact you directly too,
since you are obviously interested in this. If you have time for a chat
next week, please let me know.

Best,

Cam Webb

Jegelewicz · 2019-09-13T21:46:05Z

Notes from meeting: https://docs.google.com/document/d/19cbpGwfQJ52mt89fCag5VU-kh2Q6wuEddzpD_zw1BvE/edit#

Cam's plan is to have an additional identification field linked to taxon concepts. He will send Dusty some data and Dusty will evaluate for implementation.

Adding to Taxonomy Committee Agenda.

Jegelewicz · 2019-09-18T20:40:59Z

Cam
Grant - taxon concepts and concept mappings for AK flora. Taxon concepts and mapping relate the intersection between a name and a publication. Cam currently has a stand-alone DB for this that we could bring into Arctos.
Dusty
Enhance the link between taxon names and concepts/publications. Add an ID field and a management tool for taxon concepts/maps (two tables)
Derek
How does this handle differences of opinion on validity? Add to taxonomy metadata?
Dusty
Add like the relationships source. Use in addition to the sensu field.
Derek
A pick would be useful - by pub author
Dusty
baby steps - will get it set up then work toward the pick
Derek
There isn't enough money or people or time to do it all!
Cam
This will just be the facility to store the data if available. TDWG group has a test DwC plug-in to capture this stuff - we could be the test case!
Derek
Use the sensu field?
Cam
Use current sensu to populate taxon concepts when the tables are there.
Teresa
Who has data in sensu fields? @dustymc will open an issue to look at it.
Cam to send graphical stuff to new issue.

Let's do this! Committee says let's set it up.

mbprondzinski · 2019-09-18T20:42:32Z

Did I miss the meeting?! Crap!

Jegelewicz · 2019-09-18T23:00:11Z

Closing this as dupe of #2267

Jegelewicz added the Function-Taxonomy/Identification label Dec 13, 2018

This was referenced Dec 14, 2018

worms gaps and switching #1844

Closed

Name from Arctos taxonomy not appearing in list #1857

Closed

Jegelewicz assigned campmlc, DerekSikes, acdoll, sharpphyl, anna-chinn, mbprondzinski, Jegelewicz and dperriguey Jan 16, 2019

dustymc mentioned this issue Jan 28, 2019

worms refresh: test request #1841

Closed

This was referenced Mar 22, 2019

Classification Cloning #1641

Closed

When names aren't really synonyms #2002

Closed

This was referenced Apr 3, 2019

add species status to Arctos #1584

Closed

ID formula A sp. #1304

Closed

This was referenced Jul 16, 2019

hierarchies from /name/ #1699

Closed

nature of ID #2170

Closed

dustymc mentioned this issue Aug 22, 2019

taxonomy: dynamic classification sources #2231

Closed

dustymc mentioned this issue Sep 18, 2019

Taxon Concepts (again) #2267

Closed

Jegelewicz closed this as completed Sep 18, 2019

Jegelewicz mentioned this issue Jul 14, 2022

A method for distinguishing homonyms #4794

Closed

Taxon Concepts as a data model #1852

Taxon Concepts as a data model #1852

Comments

Jegelewicz commented Dec 13, 2018 • edited Loading

Jegelewicz commented Jan 11, 2019

dustymc commented Jan 12, 2019

DerekSikes commented Jan 14, 2019 via email

dustymc commented Jan 14, 2019

campmlc commented Jan 14, 2019 via email

dustymc commented Jan 14, 2019

Jegelewicz commented Jan 16, 2019

Jegelewicz commented Jan 16, 2019

dustymc commented Jan 16, 2019

DerekSikes commented Jan 16, 2019 via email

dustymc commented Jan 17, 2019

mbprondzinski commented Jan 17, 2019

Jegelewicz commented Jan 17, 2019

dustymc commented Jan 18, 2019

dustymc commented Jan 18, 2019

Jegelewicz commented Jan 29, 2019 • edited Loading

dustymc commented Jan 29, 2019

Jegelewicz commented Jan 29, 2019

dustymc commented Mar 20, 2019

Jegelewicz commented Apr 14, 2019

dustymc commented Apr 15, 2019

dustymc commented Apr 16, 2019

Jegelewicz commented Apr 16, 2019

dustymc commented Apr 16, 2019

DerekSikes commented Apr 16, 2019 via email

dustymc commented Apr 16, 2019

DerekSikes commented Apr 16, 2019 via email

dustymc commented Apr 16, 2019

sharpphyl commented Apr 17, 2019

dustymc commented Apr 17, 2019

Jegelewicz commented Apr 17, 2019 • edited Loading

Jegelewicz commented Apr 17, 2019

dustymc commented Apr 18, 2019

Avoiding Taxon Concepts

Outstanding questions and concerns:

Jegelewicz commented Sep 10, 2019

Jegelewicz commented Sep 13, 2019

Jegelewicz commented Sep 18, 2019 • edited Loading

mbprondzinski commented Sep 18, 2019 via email • edited Loading

Jegelewicz commented Sep 18, 2019

Jegelewicz commented Dec 13, 2018 •

edited

Loading

Jegelewicz commented Jan 29, 2019 •

edited

Loading

Jegelewicz commented Apr 17, 2019 •

edited

Loading

Jegelewicz commented Sep 18, 2019 •

edited

Loading

mbprondzinski commented Sep 18, 2019 via email •

edited

Loading