TIRmite¶

TIRmite logo

TIRmite employs profile Hidden Markov Models (HMMs) to model natural variation in transposon termini and recover divergent and degraded hits that are often missed by sequence-based aligners like BLAST.

About TIRmite¶

Autonomous examples of transposons, belonging to many distinct super-families, share two common properties: a gene or genes encoding the mode of transposition; and terminal sequence features that are recognised by these gene products as the element boundaries.

MITEs are a classic example of non-autonomous structural variants — derived from autonomous DNA elements with Terminal Inverted Repeats (TIRs), they are Miniature Inverted-repeat Transposable Elements, sometimes little more than a pair of TIRs.

When non-autonomous structural variants of a TE vastly outnumber their parent element, and include forms that capture novel genes or other full transposons, it becomes difficult to correctly cluster related elements based on the limited signal present in terminal sequences (TIRs, LTRs, etc.).

TIRmite will use profile-HMM models of Transposon Terminal Repeats for genome-wide annotation of transposon families. You can search for TE families with:

Symmetrical termini (i.e. TIRs or LTRs) — the same HMM model hits both ends of the element
Asymmetrical termini (i.e. Helitrons, Helentrons, and Starship elements) — different HMM models describe each end

Why Use HMMs Instead of Simple BLAST?¶

When searching for transposon termini across a genome, you have several options:

Approach	Pros	Cons
Single BLAST query	Simple, fast	Misses divergent/degraded copies; sensitive to query choice
Ensemble of BLAST queries, then cluster	Better coverage	Still limited by sequence identity; complex pipeline
Profile HMM (TIRmite)	Captures natural variation; detects degenerate copies; single model per terminus	Requires curating seed alignment

Profile HMMs capture the full spectrum of sequence variation across all known instances of a terminus, not just a single representative. This means:

More sensitive detection — divergent copies with low identity to any single seed are still found
Fewer false negatives — degraded MITEs and cryptic variants are recovered
Single model per terminus class — no need to manage dozens of BLAST queries; one HMM per terminus type
Probabilistic scoring — each hit is scored against the full conservation profile, not just identity to one sequence

When you have an ensemble of related termini sequences, clustering them first (e.g. to 80% identity with MMseqs2) and building one HMM per cluster lets you distinguish sub-types while still benefiting from the sensitivity of HMM-based search.

Three Output Classes¶

Three classes of output are produced:

All significant termini hit sequences are written to FASTA (per query HMM).
Candidate elements comprised of paired termini are written to FASTA (per query HMM).
Genomic annotations of candidate elements and, optionally, HMM hits (paired and unpaired) are written as a single GFF3 file.

Workflow Overview¶

flowchart LR
    A[Seed sequences\nor draft TE model] --> B[tirmite seed\nBuild HMM]
    B --> C[HMM file .hmm]
    C --> D[tirmite search\nGenome-wide search]
    D --> E[Hit table .tab]
    E --> F[tirmite pair\nPairing algorithm]
    F --> G[Paired elements .fa\nAnnotations .gff3]

Installation¶

TIRmite requires Python >= v3.9

External dependencies:

HMMER3 — nhmmer for HMM-based genome search
mafft — multiple sequence alignment for HMM building
BLAST+ (Optional) — alternative search method

Conda (recommended)¶

# Create environment with all dependencies
conda env create -f environment.yml
conda activate tirmite

From Bioconda¶

conda install -c bioconda tirmite

From PyPI¶

pip install tirmite

Development install¶

git clone https://github.com/Adamtaranto/TIRmite.git && cd TIRmite
conda env create -f environment.yml
conda activate tirmite
pip install -e '.[dev]'
pre-commit install

Test installation¶

# Print version number and exit
tirmite --version

# Get usage information
tirmite --help

Quick Start¶

# 1. Build a pHMM from seed sequences
tirmite seed \
  --left-seed tsplit_results/TIR_element_tsplit_output.fasta \
  --model-name MY_TIR \
  --outdir MY_TIR_HMM \
  --genome genome.fa \
  --threads 8

# 2. Search the genome with nhmmer
nhmmer --dna --cpu 8 --tblout MY_TIR_nhmmer_hits.tab MY_TIR_HMM/MY_TIR.hmm genome.fa

# 3. Pair the hits and annotate elements
tirmite pair \
  --genome genome.fa \
  --nhmmer-file MY_TIR_nhmmer_hits.tab \
  --hmm-file MY_TIR_HMM/MY_TIR.hmm \
  --orientation F,R \
  --mincov 0.4 \
  --maxdist 20000 \
  --outdir MY_TIR_OUTPUT \
  --gff

Subcommands¶

Command	Description
`tirmite seed`	Build HMM models from seed sequences using BLAST to find instances in a genome
`tirmite search`	Ensemble search: run BLAST/nhmmer and merge hits from clustered features
`tirmite pair`	Pair precomputed nhmmer or BLAST hits using the pairing algorithm
`tirmite legacy`	Original TIRmite workflow (HMM search + pairing, all-in-one)

Tutorials¶

Building HMMs for Transposon Termini — How to use tirmite seed to build HMM models from seed sequences
Using tirmite search — How to use tirmite search for ensemble genome searching
Using tirmite pair — How to use tirmite pair to annotate candidate transposon elements

Contributing¶

See the contribution guidelines.

License¶

Software provided under GPL-3 license.