Skip to content

rusty-dot

rusty-dot is a Rust + PyO3 tool for making fast dot plot comparisons of DNA sequences using a Rust FM-Index.

Overview

rusty-dot provides a high-performance toolkit for pairwise DNA sequence comparison and visualisation. At its core, it builds an FM-index (via rust-bio) for each sequence and uses k-mer set intersection to efficiently find shared subsequences between any two sequences in the collection.

Key Features

  • Fast FM-index construction via Rust + PyO3 bindings
  • Read FASTA / gzipped FASTA files via needletail
  • Build FM-indexes per sequence using rust-bio
  • K-mer set intersection for efficient shared k-mer lookup
  • Both-strand k-mer matching: forward (+) and reverse-complement (-) hits via compare_sequences_stranded
  • Complete RC hit coverage: two patterns merged independently — anti-diagonal (standard inverted repeat) and co-diagonal (both arms same direction)
  • Unified merge API (py_merge_runs) handles all orientation cases with a single call
  • PAF format output for alignment records
  • FM-index serialization/deserialization with serde + postcard
  • All-vs-all dotplot visualization with matplotlib: forward hits in blue, RC hits in red; edge-only axis labels in grid plots; subpanels scaled by sequence length by default (scale_sequences=True)
  • SVG vector output via the format parameter (format='svg') or by using a .svg file extension — suitable for publication-quality figures
  • Minimum alignment length filter (min_length) on DotPlotter.plot() / plot_single() — suppresses short or spurious alignment hits before rendering
  • Identity-based alignment colouring — when alignments are loaded from a PAF file, pass color_by_identity=True to colour each segment by residue_matches / alignment_block_len using any Matplotlib colormap (identity_palette); DotPlotter.plot_identity_colorbar() renders the scale as a standalone figure
  • CrossIndex multi-group cross-index: N arbitrary sequence groups, configurable group pairs for alignment, per-group contig ordering (insertion order, length, or collinearity), run_merge to update cached PAF records, compatible with DotPlotter
  • PafAlignment.filter_by_min_length() — discard short alignment records from a loaded PAF file; filters on query aligned length
  • Full Python bindings via PyO3

Quick Start

from rusty_dot import SequenceIndex
from rusty_dot.dotplot import DotPlotter

# Build index for two sequences
idx = SequenceIndex(k=15)
idx.load_fasta("genome1.fasta")
idx.load_fasta("genome2.fasta")

# Get PAF-format alignments (forward strand only)
for line in idx.get_paf("seq1", "seq2"):
    print(line)

# Stranded comparison: forward (+) and reverse-complement (-) hits
hits = idx.compare_sequences_stranded("seq1", "seq2", merge=True)
for qs, qe, ts, te, strand in hits:
    print(f"{strand}  q[{qs}:{qe}]  t[{ts}:{te}]")

# Generate dotplot — forward hits blue, RC hits red
plotter = DotPlotter(idx)
plotter.plot(output_path="dotplot.png")

# Save as SVG vector image
plotter.plot(output_path="dotplot.svg")

# Filter short alignments (< 200 bp) before plotting
plotter.plot(output_path="dotplot_filtered.png", min_length=200)

# Colour alignments by identity from a PAF file
from rusty_dot.paf_io import PafAlignment
aln = PafAlignment.from_file("alignments.paf")
plotter = DotPlotter(idx, paf_alignment=aln)
plotter.plot(output_path="identity_dotplot.png", color_by_identity=True, identity_palette="viridis")
plotter.plot_identity_colorbar(palette="viridis", output_path="colorbar.png")

Documentation Sections

  • Installation — how to install rusty-dot and its dependencies.
  • Tutorials — step-by-step Jupyter notebook walkthroughs.
  • API Reference — full documentation for all classes and functions.