Visualising Minimap2 Alignments with rusty-dot¶
This tutorial shows how to align two genome assemblies with minimap2 and visualise the output PAF file as an all-vs-all dotplot using rusty-dot.
Prerequisites¶
- minimap2 installed and available on
PATH - Two FASTA files to compare (e.g. two assemblies, or one assembly aligned to a reference)
Workflow¶
- Align assemblies with minimap2 to produce a PAF file.
- Load the PAF file with
PafAlignment.from_file(). - Optionally reorder contigs for maximum collinearity.
- Pass the
PafAlignmentobject directly toDotPlotterand plot.
from pathlib import Path
import subprocess
import matplotlib.pyplot as plt
from rusty_dot.dotplot import DotPlotter
from rusty_dot.paf_io import PafAlignment, parse_paf_file
1. Align two assemblies with minimap2¶
Replace the paths below with your own FASTA files. The asm5 preset is suitable for comparing assemblies with < 5% divergence.
minimap2 -x asm5 --cs genome_b.fasta genome_a.fasta > alignments.paf
The cell below runs minimap2 programmatically. If minimap2 is not available, skip to Section 3 where we create synthetic PAF data.
# ── Adjust these paths ───────────────────────────────────────────────────────
QUERY_FASTA = 'genome_a.fasta' # query assembly (rows in the dotplot)
TARGET_FASTA = 'genome_b.fasta' # target assembly (columns in the dotplot)
PAF_OUTPUT = 'alignments.paf'
# ─────────────────────────────────────────────────────────────────────────────
try:
result = subprocess.run(
['minimap2', '-x', 'asm5', '--cs', TARGET_FASTA, QUERY_FASTA],
capture_output=True,
text=True,
check=True,
)
Path(PAF_OUTPUT).write_text(result.stdout)
print(
f'Alignment complete. Wrote {len(result.stdout.splitlines())} records to {PAF_OUTPUT}'
)
except (FileNotFoundError, subprocess.CalledProcessError) as exc:
print(f'minimap2 not available or alignment failed: {exc}')
print('Falling back to synthetic PAF data in Section 3.')
PAF_OUTPUT = None
2. Inspect CIGAR statistics (optional)¶
When minimap2 is run with --cs the PAF file contains cg:Z: CIGAR tags. PafRecord parses these automatically.
if PAF_OUTPUT and Path(PAF_OUTPUT).exists():
records = list(parse_paf_file(PAF_OUTPUT))
print(f'Loaded {len(records)} alignment records')
if records:
r = records[0]
print(f' query: {r.query_name} ({r.query_len:,} bp)')
print(f' target: {r.target_name} ({r.target_len:,} bp)')
print(f' strand: {r.strand}')
if r.cigar:
print(
f' CIGAR-derived identity: {r.n_matches / max(r.alignment_length, 1):.2%}'
)
3. Synthetic example (minimap2 not available)¶
If minimap2 is not installed we create a small synthetic PAF file that mirrors what minimap2 would produce.
SYNTHETIC_PAF = """\
contigA1\t200\t0\t180\t+\tcontigB1\t160\t0\t160\t155\t160\t60
contigA2\t100\t10\t90\t+\tcontigB2\t80\t5\t75\t65\t70\t60
contigA1\t200\t50\t120\t-\tcontigB2\t80\t0\t70\t65\t70\t30
"""
PAF_FILE = '/tmp/synthetic_alignments.paf'
with open(PAF_FILE, 'w') as fh:
fh.write(SYNTHETIC_PAF)
if PAF_OUTPUT is None:
PAF_OUTPUT = PAF_FILE
print(f'Using PAF file: {PAF_OUTPUT}')
4. Load PAF and reorder contigs for maximum collinearity¶
PafAlignment.reorder_contigs() uses a gravity-centre algorithm to sort contigs so the dotplot shows maximum collinearity along the diagonal. Unmatched contigs are placed at the end, sorted by descending length.
aln = PafAlignment.from_file(PAF_OUTPUT)
print(aln)
q_sorted, t_sorted = aln.reorder_contigs()
print('Sorted query contigs:', q_sorted)
print('Sorted target contigs:', t_sorted)
5. Plot alignments¶
DotPlotter accepts a PafAlignment directly — no SequenceIndex is
required. Sequence lengths are read from the PAF records, and alignment
segments are drawn using the strand colour convention: forward (+) matches
are blue and reverse-complement (-) matches are red.
plotter = DotPlotter(aln)
# Use the sorted contig order; scale panels by sequence length
plotter.plot(
query_names=q_sorted,
target_names=t_sorted,
output_path='/tmp/minimap2_dotplot.png',
figsize_per_panel=4.0,
scale_sequences=True,
title='Minimap2 alignment — sorted contigs',
dpi=100,
)
from IPython.display import Image
Image('/tmp/minimap2_dotplot.png')
6. Colour alignments by identity¶
PAF records include a residue_matches column and an alignment_block_len
column that together define sequence identity:
$$\text{identity} = \frac{\text{residue\_matches}}{\text{alignment\_block\_len}}$$
Pass color_by_identity=True to DotPlotter.plot() to colour each alignment
segment by identity instead of using flat strand colours. Any Matplotlib
colormap is accepted via identity_palette (default 'viridis').
DotPlotter.plot_identity_colorbar() generates a standalone colorbar figure
that you can display alongside the dotplot.
# The same PafAlignment is used; enable identity colouring
plotter_identity = DotPlotter(aln)
plotter_identity.plot(
query_names=q_sorted,
target_names=t_sorted,
output_path='/tmp/minimap2_identity_dotplot.png',
figsize_per_panel=4.0,
scale_sequences=True,
color_by_identity=True,
identity_palette='viridis',
title='Minimap2 alignment — coloured by identity',
dpi=100,
)
# Render the identity colour scale as a separate figure
fig_cb = plotter_identity.plot_identity_colorbar(
palette='viridis',
output_path='/tmp/minimap2_identity_colorbar.png',
dpi=100,
)
plt.close(fig_cb)
from IPython.display import Image
Image('/tmp/minimap2_identity_dotplot.png')
Summary¶
| Step | Tool | Purpose |
|---|---|---|
| Align | minimap2 -x asm5 |
Generate PAF alignments |
| Load | PafAlignment.from_file() |
Parse PAF records |
| Sort | aln.reorder_contigs() |
Maximise collinearity |
| Plot | DotPlotter(aln).plot(scale_sequences=True) |
Visualise with relative scaling |
| Colour by identity | DotPlotter(aln).plot(color_by_identity=True) |
Colour each segment by identity |
The scale_sequences=True option ensures each subplot's dimensions are
proportional to the actual lengths of the compared sequences, so short and
long contigs are not artificially stretched to the same size.