Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ID formula A sp. #1304

Closed
DerekSikes opened this issue Oct 19, 2017 · 78 comments
Closed

ID formula A sp. #1304

DerekSikes opened this issue Oct 19, 2017 · 78 comments
Labels
Milestone

Comments

@DerekSikes
Copy link

Can we somehow make our data consistent for genus-only level identifications? Currently, I train my people to use ID formula ' A sp. ' but often they forget. This means that the data are inconsistent - some genus-only determinations are formula A and others are formula A sp.

I suppose we would have to all agree to allow Arctos to use only ' A sp.' whenever and ID was genus-only (and Arctos would have to 'know' when this situation exists, which I think shouldn't be too hard). OR we could set a preference in our admin settings for each collection?

@ekrimmel
Copy link

I am in favor of enforcing consistency to A sp.

@sharpphyl
Copy link

We train our volunteers to use A sp.

@Jegelewicz
Copy link
Member

I am also in favor of enforcing A sp.

@jtgiermakowski
Copy link

I am also in favor of "A sp."

@dustymc
Copy link
Contributor

dustymc commented Jan 31, 2018

From another message:

we have a lot of the bat Lasionycteris noctivagans but also 28 that read Lasionycteris noctivagans ssp. As far as I know, there are no subspecies for this species. As far as I can tell, they are all classified as just Lasionycteris noctivagans. Why do some have ssp.?

They were entered that way.

http://handbook.arctosdb.org/documentation/bulkloader.html#taxonomy

http://arctos.database.museum/info/ctDocumentation.cfm?table=cttaxa_formula

If there's a single term ranked genus in the preferred classification, I could theoretically do something with it+incoming ID formula. I could also maybe check that A ssp. determinations involve something that looks like a species and that subspecies exist.

That would all add a LOT of work/processing/complexity. It's a bunch of code to maintain, it's a bunch of pointless Operator rules (it might force you to create subspecies you'll never use before you can use the "A ssp." formula, for example), and it forces users to figure out why we have two ways of saying exactly the same thing (if they're lucky enough to find both variants).

True consistency would also demand some rule regarding existing IDs when a subspecies of Lasionycteris noctivagans is named, which doesn't seem very realistic.

Can we take the obvious path and drop the A s[s]p. formulae instead? Anyone who REALLY wants the format could use the A {string} formula to create a functionally-equivalent ID.

@DerekSikes
Copy link
Author

DerekSikes commented Jan 31, 2018 via email

@DerekSikes
Copy link
Author

DerekSikes commented Jan 31, 2018 via email

@dustymc
Copy link
Contributor

dustymc commented Feb 1, 2018

long standing tradition (at least in [some limited scope])

That is also the origins of A ssp. (and a bunch of similar things in various niches).

IF a bunch of things in our classification data are true/consistent, tacking on .sp would not be particularly hard to implement (certainly easier than .ssp, which has a couple more "ifs" involved, likely less stability, and perhaps a narrower tradition). Maintenance - what happens when someone changes the lowest ranked term in a classification to/from genus? - is probably much less trivial. If the answer potentially involves changing thousands of IDs, things like processing power may become an issue as well.

I'm not particularly objecting to any part of this idea; weird but consistent data are certainly more accessible than sometimes-weird and inconsistent data. I'm just trying to figure out what might be involved, make sure we all understand what this might mean, point out where a clean-slate data modeling exercise might end up, etc.

https://arctos.database.museum/name/Diplura claims to be/have been used for

  • Order of hexapod
  • Genus of arachnid
  • Genus of cnidarian
  • Genus of lepidopteran
  • Genus of seaweed
  • Genus of bird
  • I think multiple instances of some of those things
  • Probably some other stuff

so I question the utility of the .sp as a disambiguator at the scale of Arctos.

@DerekSikes
Copy link
Author

DerekSikes commented Feb 5, 2018 via email

@atrox10
Copy link

atrox10 commented Feb 5, 2018 via email

@sharpphyl
Copy link

We use A sp. fairly frequently and have not had a problem with volunteers not entering some type of binomial ID. We never use A spp. We would opt to keep the A sp. formula.

@DerekSikes
Copy link
Author

DerekSikes commented Feb 5, 2018 via email

@sharpphyl
Copy link

We can certainly type in Genus sp. if there is no formula to build it, but we would never identify a specimens (mollusca and other marine invertebrates) with just the genus name.

@atrox10
Copy link

atrox10 commented Feb 5, 2018 via email

@dustymc
Copy link
Contributor

dustymc commented Feb 6, 2018

inconsistency bugs me

Me too!

The inconsistency just stops users (including curatorial users) from finding what they're looking for - we have denormalized data, 2 ways of saying the same thing.

never

http://arctos.database.museum/guid/DMNS:Inv:18559
http://arctos.database.museum/guid/DMNS:Inv:9836

Despite good intentions and careful users, if something like this CAN happen it inevitably WILL.

Attached are IDs by collection where both A sp. (CNT_ASP) and A (CNT_NOSP) taxa formulae have been used for a taxon. This isn't the whole picture, but I think it's pretty strong evidence that the inconsistency is widespread.

temp_asp.csv.zip

@sharpphyl
Copy link

sharpphyl commented Feb 6, 2018 via email

@sharpphyl
Copy link

sharpphyl commented Feb 6, 2018 via email

@campmlc
Copy link

campmlc commented Feb 6, 2018 via email

@dustymc
Copy link
Contributor

dustymc commented Feb 6, 2018

Arctos cannot readily distinguish between a stand-alone genus and the same taxon term used for other organisms in a different hierarchy

#1304 (comment)

That's not the problem. I'm concerned about what happens when someone finds a clever way to add/remove genus for a monomial used in (perhaps thousands of) identifications. Do we really want scripts changing identifications, or to lock classifications because of identifications, or WHATEVER it is that we'd need to do to enforce this?

indicates the specimen has been examined

"Genus sp." and "Genus" provide the same information. One adds some unnecessary complexity, and perhaps makes it slightly more difficult to find those specimens which have multinomial determinations (vs. those that look like multinomials because we've tacked on a "traditional" string). Together they provide two ways of doing about the same thing, we act inconsistently because we can, and that makes it a bit harder to find what you're looking for and messes with "number of species..." data and etc.

There are three possibilities:

  1. Do nothing, keep the "A sp." formula, use it arbitrarily.
  2. Add a bunch of complexity for reasons that make no sense to me, but which does make things more consistent.
  3. Drop some complexity in procedures, forms, and data, and make everything more accessible by doing so.

@Jegelewicz
Copy link
Member

AWG says enforce A sp. for species name.

@campmlc
Copy link

campmlc commented Apr 12, 2018

OK to go with mandatory sp. per AWG 4-12-18

@dustymc
Copy link
Contributor

dustymc commented Apr 12, 2018

AWG 20180421:

  • force .sp
  • .ssp is a different discussion

@dustymc
Copy link
Contributor

dustymc commented Apr 12, 2018

generate report

  • flip genus-only IDs to .sp
  • OR flip "A" formula genus-only IDs to A {string} and keep the namestring

@dustymc dustymc modified the milestones: Needs Discussion, Next Task Apr 12, 2018
@dustymc
Copy link
Contributor

dustymc commented Oct 17, 2019

I'm just ignoring .ssp because nobody's asked for any impossible things with it yet!

I don't think most users will infer effort (or whatever's being attempted) from the format of the identification string.

Being forced into complicated situations - having to search for multiple things or use substring searches to find all of what you're looking for - does not seem like something a user would ever want to encounter.

@mbprondzinski
Copy link

mbprondzinski commented Nov 6, 2019

Why can't Arctos ask, "Is this a genus" for a single entry and if you check affirmative, then it can add the "sp".

@DerekSikes
Copy link
Author

DerekSikes commented Nov 6, 2019 via email

@mbprondzinski
Copy link

mbprondzinski commented Nov 6, 2019 via email

@Jegelewicz
Copy link
Member

Issue Summary:

It is proposed that we remove the taxon formula "A sp." from the Taxa Formula Code Table. At the same time, any identification using this formula will have the " sp." removed. Any collection wishing to retain " sp." will need to notify Dusty prior to the change. Their identifications will be converted to the "A string" formula and will be formatted as Genus {Genus sp.}.

If approved, an announcement will be made to the Community and collections will be given a date by which to decide.

Derek has written a fairly nice summary of the long discussion above that led to this conclusion:

There are problems with enforcing this idea of ensuring all genus-only IDs have 'sp.' added.

  1. some taxon names lack rank & thus Arctos won't know it's a genus
  2. how would this work? would one choose ID formula A and then during the 'create new id' save process Arctos would check if the name is a genus & automatically add a 'sp.' to the end? What about bulkloading names?
  3. we'd still need to clean up all the already existing genus-only identifications by adding 'sp.' to those without
  4. GBIF and other aggregators strip the 'sp.' off at their end anyhow.

For those who like having 'sp.' after genus-only identifications they could still apply this using the A {string} although this would take a little more typing & greater chance for errors being saved (eg 'sp' or 'spp' or 'ssp' instead of the intended 'sp.')

and finally... remember that these two identifications have the same meaning;

Genusname sp.

Genusname

Adding the 'sp.' adds no extra information, adds complexity, can't be enforced, creates inconsistency, etc.

@ccicero
Copy link

ccicero commented Jun 17, 2020

So where are we on the 'ssp' part of the discussion. We use that all the time for taxa that we're unable to ID to subspecies, which is important for birds. I don't think using the string for that is a good idea. Are we keeping ssp in the formula?

@Jegelewicz
Copy link
Member

@ccicero let's save that for another day and focus on this one thing for now.

@ccicero
Copy link

ccicero commented Jun 17, 2020

Fine with me. I just wanted to make sure that you also weren't getting rid of the ssp option. Thanks!

@campmlc
Copy link

campmlc commented Aug 6, 2020

AWG Recommend that we announce the decision to remove sp. via email to all Arctos.

@campmlc
Copy link

campmlc commented Aug 6, 2020

@ewommack

@ewommack
Copy link

Added an issue in the Newsletter repository for the article.

@ewommack
Copy link

I've put this down for a newsletter article, but the thread says a general email. I think an article sounds more appropriate, but I wanted to double check.

@campmlc
Copy link

campmlc commented Aug 24, 2020 via email

@Jegelewicz
Copy link
Member

This was published in the last newsletter - can we implement?

@dustymc
Copy link
Contributor

dustymc commented Nov 19, 2020

I'm ready when you are - is that an official "go"?

Unless I hear otherwise before the official"go" I will

  • change formula from A sp. to A
  • remove trailing .sp from relevant identification.scientific_name

@ewommack
Copy link

In the newsletter we gave people until the 30 Nov to get issues to @dustymc through GitHub Issues...so maybe implement on 1 Dec?

@Jegelewicz
Copy link
Member

Schedule for December 1 unless someone decides to comment. Thanks @ewommack for keeping me in line!

@acdoll
Copy link

acdoll commented Dec 30, 2020

This doesn't appear to have been implemented yet. @dustymc? I don't believe we heard any objections after the newsletter went out. @ewommack - did you receive any feedback?

@dustymc
Copy link
Contributor

dustymc commented Dec 31, 2020

Thanks @acdoll

Done, backup at temp_cache.identification20201231

@dustymc dustymc closed this as completed Dec 31, 2020
@ewommack
Copy link

Nope nothing came through the communication channels, and the article directed people submit to the issue. I think we're good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests