This collection is based on the integration of the European Nucleotide Archive (ENA, release 142. http://ftp.ebi.ac.uk/pub/databases/ena/sequence/release/) and the Barcode of Life Data Systems (BOLD: http://www.boldsystems.org/) data.
CONTENT:
Currently it contains 5,608,848 entries of metazoan COXI sequences and their corresponding taxonomic classification and metadata. MetaCOXI_Seqs.tar.gz contains the full sequence collection in 'fasta' format. MetaCOXI_Taxonomy_Metadata.tar.gz contains the entries-associated taxonomy path and additional metadata
Taxonomic path are provided for the following seven levels with their NCBI-TaxIDs: Kingdom, Phylum, Class, Order, Family, Genus, Species.
A sample file of taxonomy and metadata retrieved from ENA and BOLD is available in
The sequence lengths range is 100-3020 bp with 658 as the most frequent.
If you use MetaCOXI in your DNA sequence analysis such as DNA barcoding or metabarcoding please cite it as follows:
Bachir Balech*, Anna Sandionigi, Marinella Marzano, Graziano Pesole, Monica Santamaria. MetaCOXI: an integrated collection of metazoan mitochondrial cytochrome oxidase subunit-I DNA sequences, Database, Volume 2022, January 2022, baab084. https://doi.org/10.1093/database/baab084