Skip to content
/ scLLM Public

Generative large language models for single cell omics

Notifications You must be signed in to change notification settings

jzinno/scLLM

Repository files navigation

scLLM: Generative Large Language Models for Single-Cell Omics

scLLM is work in progress side project aiming to explore the development of generative single-cell omics large language models. A proof of concept model, Tabula5apiens, was developed and uses cell intrinsic rank ordered gene expression as sentences to train and use large language models (LLMs) for generating cell types with high accuracy. We leverage the state-of-the-art (SOTA) architectures and the Hugging Face Transformers library to create a powerful and effective pipeline for analyzing single-cell RNA-seq data, with more advanced goals on the horizon, such as multiomics.

This project is currently in development and is not yet ready for use. Please check back soon for updates!

⚡️ Features

  • 🧪 Takes advantage of SOTA open source architectures such as GPT-2 and T5
  • 📝 Uses cell intrinsic sentence representation
  • 🎯 High accuracy cell type generation
  • 🛠️ Easy-to-use API with customizable options
  • 🤗 Built on top of the Hugging Face Transformers library and PyTorch

📷 Example

📦 Installation

git clone https://github.com/jzinno/scLLM.git
cd scLLM
pip install -r requirements.txt

⌛ Upcoming

  • Rigorous testing
  • Preprint
  • Publish model(s) to Hugging Face Model Hub
  • Explore multiomics

About

Generative large language models for single cell omics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages