KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques

Large Language Models (LLMs) have significantly advanced healthcare innovation in generation capabilities. However, their application in real clinical settings is challenging due to potential deviations from medical facts and inherent biases. In this work, we develop an augmented LLM framework, KG-Rank, which leverages a medical knowledge graph (KG) with ranking and re-ranking techniques, aiming to improve free-text question-answering (QA) in the medical domain.

KG-Rank Framework

Data

We conduct experiments on four medical QA datasets, in which the answers are free-text. LiveQA consists of health questions submitted by consumers to the National Library of Medicine. ExpertQA is a high-quality long-form QA dataset with 2177 questions spanning 32 fields, along with answers verified by domain experts. Among them, 504 medical questions (Med) and 96 biology (Bio) questions were used for evaluation. MedicationQA includes 690 drug-related consumer questions along with information retrieved from reliable websites and scientific papers.

LLM-based Evaluation

KG-Rank achieves significant improvements in ROUGE, BERTScore, MoverScore, and BLEURT, however, these automatic scores have limitations in evaluating the factuality of long-form medical QA. Therefore, we introduce the GPT-4 score specifically for factuality evaluation. The evaluation criteria are designed by two resident physicians with over five years of experience.

Evalution Criteria

	Description
Factuality	The degree to which the generated text aligns with established medical facts, providing accurate explanations for further verification.
Completeness	The degree to which the generated text comprehensively portrays the clinical scenario or posed question, including other pertinent considerations.
Readability	The extent to which the generated text is readily comprehensible to the user, incorporating suitable language and structure to facilitate accessibility.
Relevance	The extent to which the generated text directly addresses medical questions while encompassing a comprehensive range of pertinent information.

KG-Rank in Open Domains

To further demonstrate the effectiveness of KG-Rank, we extend it to the open domains (including law, business, music,and history) by replacing UMLS with Wikipedia through the DBpedia API. KG-Rank realizes a 14% improvement in ROUGE-L score, indicating the effectiveness and great potential of KG-Rank.

Case Study

Case Study in Medicine:

Case Study in Open Domains:

System Demo

Early KG-Rank Demo (based on ChatGPT-3.5)
Creation Time: Oct 2023.

Contact

Rui Yang: yang.rui@duke-nus.edu.sg
Dr. Irene Li: ireneli@ds.itc.u-tokyo.ac.jp

Citation

@misc{yang2024kgrank,
      title={KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques}, 
      author={Rui Yang and Haoran Liu and Edison Marrese-Taylor and Qingcheng Zeng and Yu He Ke and Wanxin Li and Lechao Cheng and Qingyu Chen and James Caverlee and Yutaka Matsuo and Irene Li},
      year={2024},
      eprint={2403.05881},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Previous Research:
@misc{yang2023integratingumlsknowledgelarge,
      title={Integrating UMLS Knowledge into Large Language Models for Medical Question Answering}, 
      author={Rui Yang and Edison Marrese-Taylor and Yuhe Ke and Lechao Cheng and Qingyu Chen and Irene Li},
      year={2023},
      eprint={2310.02778},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2310.02778}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
KG-Rank-UMLS		KG-Rank-UMLS
KG-Rank-WikiPedia		KG-Rank-WikiPedia
KG-Rank demo.gif		KG-Rank demo.gif
README.md		README.md
case_study_one.png		case_study_one.png
case_study_two.png		case_study_two.png
framework.png		framework.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques

KG-Rank Framework

Table of Contents

Data

LLM-based Evaluation

Evalution Criteria

KG-Rank in Open Domains

Case Study

Case Study in Medicine:

Case Study in Open Domains:

System Demo

Contact

Citation

About

Releases

Packages

Languages

xzjjj/KG-Rank

Folders and files

Latest commit

History

Repository files navigation

KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques

KG-Rank Framework

Table of Contents

Data

LLM-based Evaluation

Evalution Criteria

KG-Rank in Open Domains

Case Study

Case Study in Medicine:

Case Study in Open Domains:

System Demo

Contact

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages