ProjectIDs on individual records, rather than a dataset as a whole #836

ahahn-gbif · 2022-11-02T11:33:52Z

Idea/wish captured from feedback of the regional support contractors (BID) to GBIFS:

"It is being defined with SiB Colombia how to identify in each record of a dataset its link with the BID project, within the framework of the publication of data from partner organizations/collections in the Colombian BID-CA2020 projects. The use of DwC fields such as datasetID or datasetName has been proposed by the Regional Support, but in some cases that could create conflict when the field was filled with previous data. GBIF is encouraged in building its new data model to look for a more effective mechanism to accomplish this and clarify it for the BID projects (and project partners)."

There are two main reasons for this request:

being able to adequately report what has been delivered in the context of a project where records are added or amended in an already pre-existing dataset (including, but not limited to, eBird, iNaturalist), and
being able to show such records within he project context, without either having to omit the dataset completely (as above), or alternatively overstating the dataset’s contribution by co-reporting all already existing records

Unfortunately, this is not easy – individual records would have to carry the project ID right from the point they are captured at record level – our transfer schema does not really allow for that. We are presently getting around the delivery-reporting requirement, e.g. in cases where records are published through eBird or iNaturalist, by requesting an explicit report on the data published in the project context. This is only for internal evaluation though. The second part is not easily possible, since there is no “project ID” field at record level.

Open question: do the benefits outweigh the added requirements, including internal data management and UI needs for surfacing this information?

timrobertson100 · 2022-11-03T15:35:07Z

Adding a multivalue gbif:projectID* field to the records that is supported in the IPT, ingestion, search and download on GBIF.org is not particularly difficult. We have done this for recordedByID before DwC accepted the term for example.

I'd suggest we support any project ID but give clear guidance on how people should refer to GBIF-issued project IDs (e.g. gbif:projectID=gbif:BID-PA2020-010-REG) so that we are able to clearly link them using search /search?projectID=gbif:BID-PA2020-010-REG - it may be that we don't need to prefix them if we are confident they are likely globally unique.

Would that be desirable? If so, we should move this request into gbif/pipelines.

⁣* note gbif: here is to indicate the namespace of the term, not that it is a GBIF-issued ID

ahahn-gbif · 2022-11-21T14:34:54Z

I would think it desirable, thanks - and agree about a prescribed syntax for the record level.

Some follow-up considerations, just off the top of my head:

for a dataset that has a ProjectID at metadata level but none at record level - would we need to auto-populate all records from the metadata, or does that not make sense?
for a GBIF (BID, BIFA, CESP) project page, record level ProjectID filters would need to be included alongside dataset level ones to document the project's data contribution (the use case that started this request in the first place)
inclusion in documentation/training materials needed

ManonGros · 2022-12-08T07:57:26Z

@dagendresen FYI

timrobertson100 · 2022-12-08T08:29:50Z

Moving this into pipelines then.

camiplata · 2023-01-04T16:24:03Z

Great @ahahn-gbif, we are highly interested in adopting this solution as with the BID project we had to create many matadata only datasets to fullfill the BID report needs

marcos-lg · 2023-01-05T13:39:30Z

I was exploring what we need to do in the development side in pipelines.

We already have a projectId term in the GBIF namespace and we populate it with the dataset projectId. So we have to do the following:

Make the projectId a multivalue field
Populate it with the projectId of the record if exists. Otherwise we take the projectId from the dataset. If both exist and they are different we take both values.

Then we need to adapt the IPT, search, downloads, portal, etc. and the field will be used as the other multivalue fields that we already have.

Is there anything that we are missing or has to be done differently?

camiplata · 2023-01-06T18:28:13Z

@marcos-lg This means that projectID from metadata will also become a multivalue field? That also will be useful as a collection o monitoring programs will have multiple financial sources across the years.

marcos-lg · 2023-01-10T11:47:04Z

@camiplata we can make the projectID from the metadata multivalue too but it has more implications so we need to plan it more carefully. I created this issue in the IPT so we can track it gbif/ipt#1927

MBLaursen · 2023-08-21T12:31:33Z

I concur that allowing adding projectID to individual occurrences would be very useful to monitor/acknowledge contribution of various projects to bigger datasets. Right now, projects can only refer to metadata only datasets, which is not really representative of their data mobilzation work.

timhirsch · 2023-08-21T12:59:48Z

While this may be over-interpreting the current suggestion, I can see this approach being very useful in a number of contexts for GBIF, e.g.

as mentioned by @MBLaursen , a means of demonstrating a project's contribution to very large existing datasets, .e. g. for the African Bird Atlas project where a lot of disambiguation was required to avoid over-counting of mobilized records
(possibly) a means of tagging records contributed indirectly to GBIF by means of another aggregator such as eBird, e.g. in the case of an early BIFA project in India where we were not able to reflect the huge mobilization effort made by the national eBird partner Bird Count India. Probably several steps down the road, but would record-level IDs make this kind of attribution within large datasets be more feasible? Also potentially for iNat platforms, projects?

ymgan · 2023-08-21T14:47:39Z

@debpaul this reminds me of your question at tdwg/dwc-qa#199

marcos-lg · 2023-08-31T11:31:36Z

Deployed to PROD.

timrobertson100 transferred this issue from gbif/portal-feedback Dec 8, 2022

marcos-lg self-assigned this Jan 10, 2023

This was referenced Aug 24, 2023

Allow adding a projectId to individual occurrences gbif/rs.gbif.org#115

Open

Make the metadata projectID field multivalue gbif/ipt#1927

Closed

marcos-lg closed this as completed Aug 31, 2023

MortenHofft mentioned this issue Mar 26, 2024

Add news posts -- Local Contexts/Biocultural Labels gbif/hp-new-zealand#18

Open

dagendresen mentioned this issue Apr 24, 2024

Dataset from Miljølære.no to the node IPT gbif-norway/helpdesk#177

Closed

dagendresen mentioned this issue Oct 14, 2024

filter on dataset name does not work gbif-norway/helpdesk#190

Open

aaltenburger2 mentioned this issue Oct 28, 2024

New Terms - projectTitle; projectID; fundingBodyName; fundingBodyID tdwg/dwc#527

Closed

This was referenced Dec 18, 2024

New Term - projectTitle tdwg/dwc#531

Open

New Term - projectID tdwg/dwc#532

Open

New Term - fundingAttribution tdwg/dwc#533

Open

New Term - fundingBodyID tdwg/dwc#534

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ProjectIDs on individual records, rather than a dataset as a whole #836

ProjectIDs on individual records, rather than a dataset as a whole #836

ahahn-gbif commented Nov 2, 2022

timrobertson100 commented Nov 3, 2022

ahahn-gbif commented Nov 21, 2022

ManonGros commented Dec 8, 2022

timrobertson100 commented Dec 8, 2022

camiplata commented Jan 4, 2023

marcos-lg commented Jan 5, 2023

camiplata commented Jan 6, 2023

marcos-lg commented Jan 10, 2023

MBLaursen commented Aug 21, 2023

timhirsch commented Aug 21, 2023 •

edited

Loading

ymgan commented Aug 21, 2023

marcos-lg commented Aug 31, 2023 •

edited

Loading

ProjectIDs on individual records, rather than a dataset as a whole #836

ProjectIDs on individual records, rather than a dataset as a whole #836

Comments

ahahn-gbif commented Nov 2, 2022

timrobertson100 commented Nov 3, 2022

ahahn-gbif commented Nov 21, 2022

ManonGros commented Dec 8, 2022

timrobertson100 commented Dec 8, 2022

camiplata commented Jan 4, 2023

marcos-lg commented Jan 5, 2023

camiplata commented Jan 6, 2023

marcos-lg commented Jan 10, 2023

MBLaursen commented Aug 21, 2023

timhirsch commented Aug 21, 2023 • edited Loading

ymgan commented Aug 21, 2023

marcos-lg commented Aug 31, 2023 • edited Loading

timhirsch commented Aug 21, 2023 •

edited

Loading

marcos-lg commented Aug 31, 2023 •

edited

Loading