Releases · wcmc-its/ReCiter

13 Mar 16:05

mrj4001

v3.0

0cc74bc

ReCiter 3.0 Latest

Latest

ReCiter 3.0 Release Notes

Enhanced Scoring Methodology

In previous versions (ReCiter 2.0 and earlier), publication scoring relied heavily on identity-based methods and straightforward weighting, which occasionally failed to adequately reflect nuanced affiliations or feedback-driven importance. This method limited our ability to dynamically prioritize publications based on user-submitted feedback.

In version 3.0, we've introduced a significant enhancement by employing sigmoid functions to calculate attribute subscores dynamically based on user feedback. For example, if an author has even a small number of accepted publications with a particular affiliation not listed in institutional source systems, subsequent candidate publications with that affiliation will receive higher weighting. The more publications accepted with the same affiliation, the higher the weighting.

Attributes scored via sigmoid functions now include:

Target Author Name
Email
Institution
Organization
ORCID
ORCID Co-author
Co-author ORCID
Journal
Keyword

New Signals Incorporated:

Year of Publication: Candidate articles published before the earliest accepted article will now be increasingly penalized.
Count of Accepted Publications: Enhances relevance scoring based on previously accepted articles.
Count of Rejected Publications: Improves accuracy by considering articles previously rejected.
Author Count: Adjusts scoring by accounting for the increased uncertainty associated with publications having a higher number of authors.
Relationship Scoring: Enhances the accuracy by better utilizing the number of known relationships compared to the total number of co-authors. Additionally, first name matching is now required to be explicit and detailed.
Penalty for Inferred Target Authors: Added a penalty in cases where there have been 0 or 2+ target authors inferred, addressing a common source of false positives.

Neural Network Integration:

All attribute subscores, along with legacy identity-based scores, now feed into an advanced neural network model, significantly enhancing system accuracy. We have developed two distinct neural network models:

Feedback-Driven Model: Activated when feedback is available.
No-Feedback Model: Engaged when no prior feedback exists for the author.

These neural networks were fine-tuned through iterative experimentation, leading to an optimized model configuration delivering superior accuracy compared to previous methods.

Additional Improvements:

Improved Performance: Enhanced overall system performance by optimizing lookup processes and addressing inefficiencies.
No Results Fix: Previously, if a user's name did not exist in the eSearch API, results incorrectly defaulted to the first initial search (e.g., "M[au]"). This issue has been resolved for strict searches, lenient searches, and searches involving compound names.
Identity Checks: Added checks ensuring mandatory fields—firstName, lastName, and firstInitial—are required in the identity object.
Docker Hub Credentials: Included Docker Hub credentials in the Dockerfile to avoid the "image pull limit" error.
Degree Year Discrepancy Score: Improved the logic and effectiveness of the Degree Year Discrepancy scoring.

Related Repositories:

To fully utilize ReCiter 3.0, you must update the following related repositories:

This update marks a major step forward in refining publication matching accuracy and significantly boosts the effectiveness of user feedback within ReCiter.

Assets 2

04 Apr 02:57

mrj4001

2.1.5

dd525c3

ReCiter 2.1.5

Added ORCID ID to Reciter Identity Model wcmc-its/ReCiter-Identity-Model#7
Fixed issue #527

Assets 2

01 Sep 11:22

mrj4001

2.1.4

6630751

ReCiter 2.1.4

Outputs the "Equal Contribution" attribute (equalContrib) at the author level. This attribute when set to "yes" is an indication that any given authors who have that designation should share credit. Our intention is to use this to define co-senior and co-first author when it comes to publication reporting.

Assets 2

06 Apr 19:23

sarbajitdutta

2.1.3

133c502

ReCiter 2.1.3

ReCiter container images are now publicly available in AWS ECR Public repository. Use ReCiter Public Container. To pull down the image using docker use docker pull public.ecr.aws/wcmc-its/reciter:v2.1.3
fix bug in identity
fix authorname sanitize utils test
avoid illegalstateexception for edge case
add test case for group view for filter
make group api post request with uids as body
fix commentCorrections pmid null check

Assets 2

15 Dec 20:43

sarbajitdutta

2.1.2

02f0476

ReCiter 2.1.2

#485 Fix log4j vulnerability
#486 Fix squiggly filters
#484 Bug fixes for feature generator by group. Feature generator by group api now accepts list of unique IDs as parameter. When this parameter is supplied all other filtering parameter is ignored. There is a new property in application.properties property to set the max allowed limit of uids to make sure the performance of the api is not impacted.
Suppress antlr runtime warnings

Assets 2

23 Aug 13:57

sarbajitdutta

2.1.1

1e1c871

ReCiter 2.1.1

This release includes a bunch of bug fixes and enhancements especially improvements to nameScoring Strategy

#474 Name scoring strategy bug fix for mismatched names
#473 Addition of more meshMajor Terms
#455 Capture lookup_type in esearchresults
#370 Fix nameScoring bugs
#322 Output email even if it's not a match
#454 Candidate article count is wrong
#444 Update Feature Generator API so it returns count of pending publications for a scholar

Assets 2

04 Feb 17:29

sarbajitdutta

2.1.0

e2b5966

ReCiter 2.1.0

Esearchresults table now include lookupType. This allows us to more reliably identify the count of candidate articles for the articleCountStrategy in cases where the ONLY_NEWLY_ADDED_PUBLICATIONS is used. #455
For articleCountStrategy, candidate article count now relies on distinct count of all retrieved publications except those from the gold standard retrieval strategy. #454
Time-based lookups against PubMed were only looking for articles based on date added to Entrez. This caused some publications to be missed. Now we're searching for that or date added to PubMed. #450
Update Swagger from 2.0 → 3.0. #447
Update Java 8 → 11. #446
Environment variable JAVA_OPTS was added to docker image to specify java heap size https://github.com/wcmc-its/ReCiter/blob/a3d5d4665e8692853ca69f2db0caba0eb56f557d/kubernetes/k8-deployment.yaml#L81-L82 and also to Dockerfile https://github.com/wcmc-its/ReCiter/blob/a3d5d4665e8692853ca69f2db0caba0eb56f557d/Dockerfile#L8
Output the top keywords and their counts for accepted publications. This will be used in Publication Manager. #442
Output count of pubs where userAssertion = NULL as attribute enhancement. This will be used in Publication Manager. #399
ReCiter Identity data model was updated to v2.0.8 wcmc-its/ReCiter-Identity-Model#3 to include primaryOrganizationalUnit, primaryInstitution, startDate, and endDate
ReCiter Article data model was updated to v2.0.16. This includes adding orcid identifier, affiliations and emails for authors, countOfPendingPubs, topArticleKeywords
Fixed error running DynamoDb locally in Docker. #452
Add healthcheck path for application use <protocol>://<host>:<port>/reciter/ping
Upgrade to all dependencies to use latest stable releases
AWS Codebuild images were also updated to use Java 11 and latest release
Docker image was updated to use adoptopenjdk/openjdk11:alpine-jre for security

Assets 2

21 Jul 19:24

paulalbert1

2.0.0

0ec26df

ReCiter 2.0.0

Create a Multi-User Feature Generator API, which outputs pending articles for groups of scholars. This can be used in Publication Manager to quickly review pending publications for large groups of people. #330
Feature Generator API now outputs:
- ORCID identifiers associated with authors #336
- an identifier associated with each cluster #365
- MeSH terms #402
More powerful use of the year when scholars received their degree. #391
Identity API returns list of scholars via S3-based cache, significantly improving performance of Publication Manager. #400
Support for Kubernetes, an open-source system for automating deployment, scaling, and management of containerized application
Bug fix: Analysis objects are in both DynamoDB Analysis table and s3, and should only be in s3 #392
Bug fix: incremental lookup
Updated timeout settings
Add performance metrics for s3 caching
Updated article and identity models in Maven Central

Assets 2

01 Sep 14:35

paulalbert1

1.2

e4e339b

ReCiter 1.2

Evidence weights in application.properties are now optimized according to a support vector machine analysis
Created a userFeedback service for feedback from Publications Manager
Added an API controller in Swagger for ReCiter Publications Manager
Fixed a bug in common affiliation strategy
Bucket names in S3 are dynamically created
Fixed affiliation count of non-target authors. #361

Assets 2

12 Jun 20:51

sarbajitdutta

v1.1

f9e2c33

ReCiter 1.1

Release notes for ReCiter 1.1

Use name to infer gender of targetAuthor and identity. Downweight cases where there's a difference in inferred gender. #357
Tracks a person’s original name as recorded in a source system and outputs it in the feature generator as opposed to using the sanitized/standardized version of that name. #317
Tracks an organization’s original name as recorded in a source system and outputs it in the feature generator as opposed to using the standardized version and/or synonym of that name. #356
Single matching departmental affiliation, no matter the synonyms, should only count once. #326
Update articleCountScoringStrategy so it better accounts for retrieval counts in strict mode. This way people with more common names get lower scores for articleCountScoringStrategy - even though their looks up are done in strict mode. #278
Penalize relationship scores in cases for each non-match. This will address cases where there are a lot of co-authors and just by sheer chance some of them have a known relationship match. #341
Added ScienceMetrix journalDepartmentCategory scores. This covers the 250+ most common organizational affiliations in PubMed and their scores for all 180 subfields. #352
The number of organizational unit synonyms has been expanded. In many cases, it includes commons translations, e.g., Cirugia (Surgery). This expands the coverage of journalDepartmentCategory scoring. #354
journalDepartmentCategory scoring should pick most favorable match. This is useful in cases where a person has multiple organizational affiliations, one of which scores highly. #355
Improved method for identifying target author. It turns out author’s email is often not assigned to the person behind that email. #185

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReCiter 3.0 Release Notes

Enhanced Scoring Methodology

New Signals Incorporated:

Neural Network Integration:

Additional Improvements:

Related Repositories:

Releases: wcmc-its/ReCiter

ReCiter 3.0

ReCiter 3.0 Release Notes

Enhanced Scoring Methodology

New Signals Incorporated:

Neural Network Integration:

Additional Improvements:

Related Repositories:

ReCiter 2.1.5

ReCiter 2.1.4

ReCiter 2.1.3

ReCiter 2.1.2

ReCiter 2.1.1

ReCiter 2.1.0

ReCiter 2.0.0

ReCiter 1.2

ReCiter 1.1