txt2hpo

txt2hpo is a Python library for extracting HPO-encoded phenotypes from text. txt2hpo recognizes differences in inflection (e.g. hypotonic vs. hypotonia), handles negation and comes with a built-in medical spellchecker.

Installation

Install using pip

pip install txt2hpo

Install from GitHub

git clone https://github.com/GeneDx/txt2hpo.git
cd txt2hpo
python setup.py install

Library usage

from txt2hpo.extract import Extractor
extract = Extractor()

result = extract.hpo("patient with developmental delay and hypotonia")

print(result.hpids)


["HP:0001263", "HP:0001290"]

txt2hpo will attempt to correct spelling errors by default, at the cost of slower processing. This feature can be turned off by setting the correct_spelling flag to False.

from txt2hpo.extract import Extractor
extract = Extractor(correct_spelling = False)
result = extract.hpo("patient with devlopental delay and hyptonia")

print(result.hpids)

[]

txt2hpo handles negation using negspaCy. To remove negated phenotypes set remove_negated flag to True. Both the extracted and negated HPO terms can be retrieved.

from txt2hpo.extract import Extractor
extract = Extractor(remove_negated=True)
result = extract.hpo("patient has developmental delay but no hypotonia")

print(result.hpids)

["HP:0001263"]

print(result.negated_hpids)

["HP:0001252"]

txt2hpo picks the longest overlapping phenotype by default. To disable this feature set remove_overlapping flag to False.

from txt2hpo.extract import Extractor
extract = Extractor(remove_overlapping=False)
result = extract.hpo("patient with polycystic kidney disease")

print(result.hpids)

["HP:0000113", "HP:0000112"]


extract = Extractor(remove_overlapping=True)
result = extract.hpo("patient with polycystic kidney disease")

print(result.hpids)

["HP:0000113"]

txt2hpo outputs a valid JSON string, which contains information about extracted HPIDs, their character span and matched string.

from txt2hpo.extract import Extractor
extract = Extractor()

result = extract.hpo("patient with developmental delay and hypotonia")

print(result.json)


'[{"hpid": ["HP:0001290"], "index": [37, 46], "matched": "hypotonia"}, 
{"hpid": ["HP:0001263"], "index": [13, 32], "matched": "developmental delay"}]'

Name	Name	Last commit message	Last commit date
Latest commit jamienoss Merge pull request #62 from rebeccaito/61-cross-platform-home-dir Feb 21, 2023 d74de3d · Feb 21, 2023 History 224 Commits
.github/workflows	.github/workflows	Update pythonpackage.yml	Oct 20, 2021
tests	tests	keep negated HPO terms	Oct 19, 2021
txt2hpo	txt2hpo	61-Use a cross-platform reference to HOME directory in config.py	Feb 17, 2023
.gitignore	.gitignore	add conflict resolution based on doc2vec	Feb 1, 2020
LICENSE	LICENSE	add license	Nov 4, 2019
MANIFEST.in	MANIFEST.in	include manifest	May 7, 2020
Pipfile	Pipfile	Bump nltk from 3.4.5 to 3.6.5	Oct 20, 2021
README.md	README.md	Update README.md	Oct 20, 2021
requirements.txt	requirements.txt	refactor spacy model loading	Mar 13, 2020
setup.py	setup.py	remove phenopy as dependency	Mar 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

txt2hpo

Installation

Library usage

About

Releases 6

Packages

Contributors 8

Languages

License

GeneDx/txt2hpo

Folders and files

Latest commit

History

Repository files navigation

txt2hpo

Installation

Library usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 8

Languages

Packages