TIRmite

Build and map profile Hidden Markov Models for Terminal Inverted Repeat families (TIR-pHMMs) to genomic sequences for annotation of MITES and complete DNA-Transposons with variable internal sequence composition.

If you have a draft TE model (i.e. from RepeatModeler or EDTA) and want to identify the TIR's to use with TIRmite - we recommend using tSplit a tool for extraction of terminal repeats from complete transposons.

Use nhmmer genome with TIR-pHMM.
Import all hits below --maxeval threshold.
For each significant TIR match identify candidate partners, where:
* Is on the same sequence.
* Hit is in complementary orientation.
* Distance is <= --maxdist.
* Hit length is >= model length * --mincov.
Rank candidate partners by distance downstream of positive-strand hits, and upstream of negative-strand hits.
Pair reciprocal top candidate hits.
For unpaired hits, find first unpaired candidate partner and check for reciprocity.
If the first unpaired candidate is non-reciprocal, check for 2nd-order reciprocity (is outbound top-candidate of current candidate reciprocal.)
Iterate steps 6-7 until all TIRs are paired OR number of iterations without new pairing exceeds --stableReps.

Options and usage

Installing TIRmite

TIRmite requires Python >= v3.8

Dependencies:

TIR-pHMM build and search
- HMMER3
Extract terminal repeats from predicted TEs
- pymummer version >= 0.10.3 with wrapper for nucmer option --diagfactor.
- MUMmer
- BLAST+ (Optional)

You can create a Conda environment with these dependencies using the YAML files in this repo.

conda env create -f environment.yml

conda activate tirmite

Installation options:

pip install the latest development version directly from this repo.

% pip install git+https://github.com/Adamtaranto/TIRmite.git

Install latest release from PyPi.

% pip install tirmite

Install from Bioconda.

% conda install -c bioconda tirmite

Clone from this repository and install as a local Python package.

Do this if you want to edit the code.

git clone https://github.com/Adamtaranto/TIRmite.git && cd TIRmite && pip install -e '.[dev]'

Test installation.

# Print version number and exit.
% tirmite --version
tirmite 1.2.0

# Get usage information
% tirmite --help

Example usage

Report all hits and valid pairings of TIR_A in target.fasta (interval <= 10000, hits cover > 40% len of hmm model), and write GFF3 annotation file.

% tirmite --genome target.fasta --hmmFile TIR_A.hmm --gffOut TIR_elements_in_Target.gff3 --maxdist 10000 --mincov 0.4

If you don't have a HMM of your TIR, TIRmite can create one for you using an aligned sample of your TIR with --alnFile.

To skip HMM search and run the pairing algorithm on a custom set of TIR hits (i.e. from blastn), you can provide hits in BED format with --pairbed.

TIRs should always be oriented 5`- 3` with the lefthand TIR.

In this example the two TIRs should be oriented to begin with "GA".

5` GA>>>>>>> ATGC <<<<<<<TC 3`
3` CT>>>>>>>> TACG <<<<<<<AG 5`

Standard options

Run tirmite --help to view the program's most commonly used options:

tirmite [-h] [--version] --genome GENOME [--hmmDir HMMDIR]
               [--hmmFile HMMFILE] [--alnDir ALNDIR] [--alnFile ALNFILE]
               [--alnFormat {clustal,fasta,nexus,phylip,stockholm}]
               [--pairbed PAIRBED] [--stableReps STABLEREPS] [--outdir OUTDIR]
               [--prefix PREFIX] [--nopairing] [--gffOut]
               [--reportTIR {None,all,paired,unpaired}] [--padlen PADLEN]
               [--keeptemp] [-v] [--cores CORES] [--maxeval MAXEVAL]
               [--maxdist MAXDIST] [--nobias] [--matrix MATRIX]
               [--mincov MINCOV] [--hmmpress HMMPRESS] [--nhmmer NHMMER]
               [--hmmbuild HMMBUILD]

Info: 
  -h, --help            Show this help message and exit
  --version             Show program's version number and exit
  
Input options:
  --genome              Path to target genome that will be queried with HMMs.
                          Note: Sequence names must be unique. (required)
  --hmmDir              Directory containing pre-prepared TIR-pHMMs.
  --hmmFile             Path to single TIR-pHMM file. Incompatible with "--hmmDir".
  --alnDir              Path to directory containing only TIR alignments to be
                          converted to HMM.
  --alnFile             Provide a single TIR alignment to be converted to HMM.
                          Incompatible with "--alnDir".
  --alnFormat           Alignments provided with "--alnDir" or "--alnFile" are
                          all in this format.
                          Choices=["clustal","fasta","nexus","phylip", "stockholm"]
  --pairbed             If set TIRmite will preform pairing on TIRs from
                          custom bedfile only.

Pairing heuristics:
  --stableReps          Number of times to iterate pairing procedure when no
                         additional pairs are found AND remaining unpaired hits > 0.
                         (Default = 0)

Output and housekeeping:
  --outdir OUTDIR       All output files will be written to this directory.
  --prefix PREFIX       Add prefix to all TIRs and Paired elements detected in
                          this run. Useful when running same TIR-pHMM against
                          many genomes.
                          (Default = None)
  --nopairing           If set, only report TIR-pHMM hits. Do not attempt
                          pairing.
                          (Default = False)
  --gffOut              If set report features as prefix.gff3. File saved to
                          outdir.
                          (Default = False)
  --reportTIR           Options for reporting TIRs in GFF annotation file.
                          Choices=[None,'all','paired','unpaired']
                          (Default = 'all')
  --padlen              Extract x bases either side of TIR when writing TIRs to fasta.
                          (Default = None)
  --keeptemp            If set do not delete temp file directory.
                          (Default = False)
  -v, --verbose         Set syscall reporting to verbose.
  
HMMER options:
  --cores               Set number of cores available to hmmer software.
  --maxeval             Maximum e-value allowed for valid hit.
                          (Default = 0.001)
  --maxdist             Maximum distance allowed between TIR candidates to
                          consider valid pairing.
                          (Default = None)
  --nobias              Turn OFF bias correction of scores in nhmmer.
                          (Default = False)
  --matrix              Use custom DNA substitution matrix with nhmmer.
  --mincov              Minimum valid hit length as prop of model length.
                          (Default = 0.5)

Non-standard HMMER paths:
  --hmmpress            Set location of hmmpress if not in PATH.
  --nhmmer              Set location of nhmmer if not in PATH.
  --hmmbuild            Set location of hmmbuild if not in PATH.

Custom DNA Matrices

nhmmer can be supplied with custom DNA score matrices for assessing hmm match scores. Standard NCBI-BLAST matrices such as NUC.4.4 are compatible. (See: ftp://ftp.ncbi.nlm.nih.gov/blast/matrices/NUC.4.4)

Issues

Submit feedback to the Issue Tracker

License

Software provided under MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github		.github
.vscode		.vscode
docs		docs
src/tirmite		src/tirmite
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
LICENSE.txt		LICENSE.txt
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TIRmite

Table of contents

About TIRmite

Algorithm overview

Options and usage

Installing TIRmite

Example usage

Standard options

Custom DNA Matrices

Issues

License

About

Releases 2

Languages

License

Adamtaranto/TIRmite

Folders and files

Latest commit

History

Repository files navigation

TIRmite

Table of contents

About TIRmite

Algorithm overview

Options and usage

Installing TIRmite

Example usage

Standard options

Custom DNA Matrices

Issues

License

About

Resources

License

Stars

Watchers

Forks

Releases 2

Languages