3Cnet: pathogenicity prediction of human variants using multitask learning with evolutionary constraints

Developed by 3billion Co.
Published online (16 July 2021) @ bioinformatics (OUP)
- https://academic.oup.com/bioinformatics/article/37/24/4626/6322986

Update logs

Mar 11, 2024

A sample trained model is added to the github. (3Cnet/MT_models/36.pt)
- You can test the model even without GPU resources. (automatically use CPU instead)
- Training a new model may need GPU resources.
To test the model

$ python model_evaluator.py -e 36

To evaluate your own variants, you may need to add a HGVSp written file and change omegaconf.yaml.

Feb 7, 2024

Released 3Cnet version 2.0
- Please see: https://zenodo.org/records/10212255
- Major changes
  - 3Cnet v2 is no longer dependent to SNVBOX features.
  - Almost all types of in-exon variants can be inferred (see neuron/constants.py).
  - Better performance compared to 3Cnet v1 (ROC-AUC = 91% -> 93% for external clinvar).

Feb 7, 2022

Corrected an error in SNVBOX feature files that led to decreased performance
- Please see: https://zenodo.org/record/6016720

May 7, 2021

Initial release of 3Cnet

Installation

3Cnet ver.2 was trained using the following versions of software:

We recommend you have at least 40GB of free storage.

STEP 1: Clone the 3Cnet repository

$ git clone https://github.com/KyoungYeulLee/3Cnet.git

STEP 2: Set up environment

We assume that you are running our model on one or more NVIDIA GPUs.

Option 1: Use Docker (recommended)

Install Docker and nvidia-container-toolkit

Docker Engine (we use Docker 20.10.9)

https://docs.docker.com/engine/install/

NVIDIA/container-toolkit (to use NVIDIA GPUs)

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

If you don't usually have access to root/sudo, consider Docker Rootless

https://docs.docker.com/engine/security/rootless/

Build the 3Cnet Docker image

$ sudo docker build -t 3billion/3cnet:v2.0.0 .

Run docker image interactively

$ sudo docker run --gpus all -it -v $(pwd):/workspace 3billion/3cnet:v2.0.0 bash
$ cd workspace

Option 2: Install using pip

pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html

STEP 3: Run `download_data.py` to retrieve necessary files from Zenodo

(uses requests and tqdm)

$ python download_data.py

TODO

Code execution (continuing from data download)

To train 3Cnet

$ python model_trainer.py -s model_name

To evaluate 3Cnet performance

$ python model_evaluator.py -m model_name -e 30 -s test_result

Note that you need to select a proper epoch number (30 in the example)

Data and file structures

download_data.py: Retrieves data/ directory from Zenodo.
model_trainer.py: Top-level script for 3Cnet training. Outcome includes model parameters, training log, config backup
model_evaluator.py: Evaluate model using trained model parameters. The test result will be saved in the model dir (pred.tsv)
omegaconf.yaml: Anaconda-compatible environment yaml.

neuron

aa_to_int_mappings.py: mapping between amino-acid string to integer representation.
constants.py: definition of variants used in this project.
errors.py: definition of errors.
seq_database.py: Script that parse sequence information from the data.
seq_collection.py: Script that define the collection of sequence objects.
sequences.py: Script that define the sequence object.
featurizer.py: Script that featurize sequence object into trainable features.
utils.py: Utility script.

cccnet

dataset_builder.py: Class that build pytorch dataset from HGVSp written files
torch_dataset.py: Dataset class definition for 3Cnet.
torch_network.py: The 3Cnet architecture is defined here (nn.Module).
deep_utils.py: Utility script for deep learning.
utils.py: Utility script.

data

reference_sequences.tsv: the file containing sequence ID and its amino-acid sequence.
msa_arrays/: NP_*.npy files representing each residues of conservative proportion of 21-amino acids
train_hgvsps/
- train_clinvar_hgvsps.tsv: pathogenic-or-benign-labeled variants from ClinVar
- train_gnomad_hgvsps.tsv: benign-labeled variants from gnomAD
- train_conservation_hgvsps.tsv: pathogenic-like and benign-like variants inferred from conservation data
test_hgvsps/: Contains data pertaining to the external clinvar test set and patient data test results.
- test_clinvar_missense_hgvsps.tsv: Variants from external clinvar (missense variants)
- test_clinvar_non-missense_hgvsps.tsv: Variants from external clinvar (non-missense variants)
- test_inhouse_hgvsps.tsv: inhouse patients variants (missense variants)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3Cnet: pathogenicity prediction of human variants using multitask learning with evolutionary constraints

Update logs

Mar 11, 2024

Feb 7, 2024

Feb 7, 2022

May 7, 2021

Installation

STEP 1: Clone the 3Cnet repository

STEP 2: Set up environment

Option 1: Use Docker (recommended)

Install Docker and nvidia-container-toolkit

Build the 3Cnet Docker image

Run docker image interactively

Option 2: Install using pip

STEP 3: Run `download_data.py` to retrieve necessary files from Zenodo

TODO

Code execution (continuing from data download)

Data and file structures

neuron

cccnet

data

About

Releases 5

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
MT_models		MT_models
cccnet		cccnet
neuron		neuron
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
custom.mustache		custom.mustache
download_data.py		download_data.py
model_evaluator.py		model_evaluator.py
model_trainer.py		model_trainer.py
omegaconf.yaml		omegaconf.yaml
requirements.txt		requirements.txt

License

KyoungYeulLee/3Cnet

Folders and files

Latest commit

History

Repository files navigation

3Cnet: pathogenicity prediction of human variants using multitask learning with evolutionary constraints

Update logs

Mar 11, 2024

Feb 7, 2024

Feb 7, 2022

May 7, 2021

Installation

STEP 1: Clone the 3Cnet repository

STEP 2: Set up environment

Option 1: Use Docker (recommended)

Install Docker and nvidia-container-toolkit

Build the 3Cnet Docker image

Run docker image interactively

Option 2: Install using pip

STEP 3: Run download_data.py to retrieve necessary files from Zenodo

TODO

Code execution (continuing from data download)

Data and file structures

neuron

cccnet

data

About

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 5

Languages

STEP 3: Run `download_data.py` to retrieve necessary files from Zenodo

Packages