Skip to content

greninger-lab/vadr-models-hcov

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Seasonal human coronavirus (HCoV) genome annotation


How to annotate HCoV genomes with VADR

Installation instructions:

1. Install VADR

Option A: Use a pre-built Docker image

You can use the StaPH-B Docker image for VADR 1.6.3-hav-flu2 created by Curtis Kapsak (docker image names: staphb/vadr:1.6.3-hav-flu2 and staphb/vadr:latest). This is available from: DockerHub Quay You can pull the image using: docker pull --platform linux/amd64 staphb/vadr:1.6.3-hav-flu2

Option B: Install VADR from source

Alternatively, you can download and install the latest version of VADR, following the instructions on the VADR GitHub.

2. Download the HCoV VADR Model

Clone the latest HCoV VADR model (release v1.01)
git clone git@github.com:greninger-lab/vadr-models-hcov.git

3. Run HPIV annotation Note: Nucleotide sequences must be in FASTA format and should not be aligned. The software only recognizes IUPAC nucleotide codes and does not accept symbols such as - (which indicate deletions in alingments). Remove any terminal ambiguous nucleotides (e.g. "N") which typically represent regions with no sequencing coverage. You can use teh script fasta-trim-terminal-ambigs.pl located in $VADRSCRIPTSDIR/miniscripts/ to clean your sequences accoridngly. To remove too short and too long sequences to create a new trimmed file <trimmed-fasta-file>, execute:

$VADRSCRIPTSDIR/miniscripts/fasta-trim-terminal-ambigs.pl --minlen 50 --maxlen 33000 <input-fasta-file> > <trimmed-fasta-file>

Run the v-annotate.pl program on an input trimmed fasta file with HCoV sequences using the recommended command below. Note the path to the directory name including the specific HPIV species subdirectory (e.g. </path/to/vadr-models-hpiv>/229E or </path/to/vadr-models-hpiv>/NL63) In addition, <hcov-key> must indicate the HPIV species 229E, HKU1, NL63, or OC43.

Use the following command lines:

229E and NL63:

v-annotate.pl -s --glsearch -r -f --mkey <hpiv-key> --mdir <hcov-models-dir-path> <fasta-file-to-annotate> <output-directory-to-create>

HKU1 and OC43:

v-annotate.pl -s --glsearch -r -f --alt_pass discontn --mkey <hpiv-key> --mdir <hcov-models-dir-path> <fasta-file-to-annotate> <output-directory-to-create>

After running the v-annotate.pl, there will be a number of files generated in the <output-directory-to-create>. Among these files, there are 5-column tab-delimited feature table files that end with the suffix .tbl. There is a separate file for passing (XXXXX.vadr.pass.tbl) and failing (XXXXX.vadr.fail.tbl) sequences. The format of the .tbl files is described here: https://www.ncbi.nlm.nih.gov/genbank/feature_table/

More information about understanding failures and error alerts can be found in the VADR documentation here: https://github.com/ncbi/vadr/blob/master/documentation/annotate.md


HCoV VADR model libraries

  • The VADR model libraries for HCoV annotation include models for species 229E, HKU1 (genotypes A, B and C), NL63, and OC43.
  • Some of the model genomes have been modified slightly on either the 3' or 5' ends to facilitate accurate annotation of sequences of greater length. These include:
    • KY996417 (229E) 3' +15 As
    • AY884001 (HKU1 B) 5' +3 Ts
    • MT118678 (OC43) 5' +1 Gs, 3' +16 As

Reference

  • The recommended citation for using VADR is: Alejandro A Schäffer, Eneida L Hatcher, Linda Yankie, Lara Shonkwiler, J Rodney Brister, Ilene Karsch-Mizrachi, Eric P Nawrocki; VADR: validation and annotation of virus sequence submissions to GenBank. BMC Bioinformatics 21, 211 (2020). https://doi.org/10.1186/s12859-020-3537-3

About

Seasonal human coronavirus (HCoV) VADR models

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •