Skip to content

Low-Level Functions

These functions are implemented in Rust and exposed via PyO3. They provide fine-grained access to the underlying FM-index and k-mer machinery. For most use cases, the SequenceIndex class is more convenient.

FASTA I/O

py_read_fasta(path) builtin

Python binding: read sequences from a FASTA or gzipped FASTA file.

Parameters:

Name Type Description Default
path str

Path to the FASTA or FASTA.gz file.

required

Returns:

Type Description
dict[str, str]

Dictionary mapping sequence name to sequence string.

Raises:

Type Description
ValueError

If the file cannot be opened, parsed, or contains duplicate sequence names.

K-mer Operations

py_build_kmer_set(seq, k) builtin

Python binding: build the set of unique k-mers in a sequence.

Parameters:

Name Type Description Default
seq str

The DNA sequence string (uppercase recommended).

required
k int

The k-mer length.

required

Returns:

Type Description
set[str]

Set of unique k-mer strings found in the sequence.

Raises:

Type Description
ValueError

If k is 0 or the sequence is empty.

py_find_kmer_coords(seq, kmers) builtin

Python binding: find all positions of each k-mer in a sequence.

Parameters:

Name Type Description Default
seq str

The DNA sequence to search in.

required
kmers list[str]

List of k-mer strings to search for.

required

Returns:

Type Description
dict[str, list[int]]

Dictionary mapping each k-mer to its list of start positions (0-based).

Raises:

Type Description
ValueError

If the sequence is invalid or k-mers are inconsistent.

Merging K-mer Runs

rusty-dot provides four merge functions covering all k-mer alignment orientations. py_merge_runs is the recommended entry-point for new code; the strand-specific functions are available for lower-level control.

Unified entry-point

py_merge_runs(kmer_coords, query_kmer_positions, k, strand) builtin

Python binding: unified merge for both strand orientations.

A single entry-point for merging k-mer coordinate runs on either the forward ("+"```) or reverse-complement ("-"`) strand. The function dispatches to :func:merge_fwd_runsfor the forward strand. For the reverse strand it applies **both** the anti-diagonal algorithm (:func:merge_rev_runs) and the co-diagonal algorithm (:func:merge_rev_fwd_runs`), combining the results and deduplicating any identical blocks.

For the forward strand, kmer_coords contains the positions of each k-mer in the target sequence and query_kmer_positions contains the positions of the same k-mers in the query sequence.

For the reverse strand, kmer_coords should contain the positions of the reverse complement of each query k-mer in the target (as returned by find_rev_coords_in_index), and query_kmer_positions contains the positions of the original k-mers in the query.

Parameters:

Name Type Description Default
kmer_coords dict[str, list[int]]

Mapping of k-mer to 0-based target positions. For strand="-", these are the positions of the RC of each k-mer in the target.

required
query_kmer_positions dict[str, list[int]]

Mapping of k-mer to 0-based query positions.

required
k int

The k-mer length.

required
strand str

Orientation of the match: "+" for forward (co-linear diagonal) or "-" for reverse-complement (both anti-diagonal and co-diagonal patterns are merged and returned together).

required

Returns:

Type Description
list[tuple[int, int, int, int, str]]

List of (query_start, query_end, target_start, target_end, strand) 5-tuples. Coordinates are 0-based; end positions are exclusive. strand echoes the input argument so callers can mix results from multiple calls without losing orientation information.

Raises:

Type Description
ValueError

If strand is neither "+" nor "-".

Forward-strand merge

py_merge_kmer_runs(kmer_coords, query_kmer_positions, k) builtin

Python binding: merge sequential k-mer coordinate runs (forward strand).

Parameters:

Name Type Description Default
kmer_coords dict[str, list[int]]

Mapping of k-mer to list of target start positions (0-based).

required
query_kmer_positions dict[str, list[int]]

Mapping of k-mer to list of query start positions (0-based).

required
k int

The k-mer length.

required

Returns:

Type Description
list[tuple[int, int, int, int]]

List of (query_start, query_end, target_start, target_end) tuples. Coordinates are 0-based, with end positions exclusive.

Reverse-complement merges

Two complementary algorithms cover all reverse-complement alignment patterns:

Pattern When to use
py_merge_rev_runs RC target positions decrease as query advances (query +1, target −1 per step) — standard inverted-repeat alignment where the two arms face each other
py_merge_rev_fwd_runs RC target positions increase as query advances (query +1, target +1 per step) — both repeat arms run in the same left-to-right direction

py_merge_runs(strand="-") calls both and deduplicates the results automatically.

py_merge_rev_runs(target_rev_coords, query_kmer_positions, k) builtin

Python binding: merge reverse-complement (- strand) k-mer coordinate runs.

For each k-mer, target_rev_coords should contain the positions in the target where the reverse complement of that k-mer was found (as returned by find_rev_coords_in_index). Consecutive anti-diagonal pairs — where query_pos advances by 1 and the corresponding RC target position decreases by 1 — are merged into a single CoordPair.

Parameters:

Name Type Description Default
target_rev_coords dict[str, list[int]]

Mapping of k-mer to the 0-based start positions of its reverse complement in the target sequence.

required
query_kmer_positions dict[str, list[int]]

Mapping of k-mer to its 0-based start positions in the query sequence.

required
k int

The k-mer length.

required

Returns:

Type Description
list[tuple[int, int, int, int]]

List of (query_start, query_end, target_start, target_end) tuples representing merged --strand match regions. Coordinates are 0-based with end positions exclusive. target_start and target_end are the forward-strand boundaries of the RC match region on the target.

py_merge_rev_fwd_runs(target_rev_coords, query_kmer_positions, k) builtin

Python binding: merge reverse-complement (- strand) k-mer coordinate runs that are co-linear on a forward diagonal (RC positions increase as query position increases).

This is the complement of :func:py_merge_rev_runs, which handles the anti-diagonal case. Together they cover all possible orientations of reverse-complement k-mer matches:

  • py_merge_rev_runs — anti-diagonal: q advances +1, t_rc decreases by 1 (standard inverted-repeat / reverse-complement alignment).
  • py_merge_rev_fwd_runs — forward diagonal: q advances +1, t_rc also advances +1 (inverted-repeat case where both arms run in the same direction).

Parameters:

Name Type Description Default
target_rev_coords dict[str, list[int]]

Mapping of k-mer to the 0-based start positions of its reverse complement in the target sequence (as returned by find_rev_coords_in_index).

required
query_kmer_positions dict[str, list[int]]

Mapping of k-mer to its 0-based start positions in the query sequence.

required
k int

The k-mer length.

required

Returns:

Type Description
list[tuple[int, int, int, int]]

List of (query_start, query_end, target_start, target_end) tuples representing merged --strand match regions where RC target positions advance together with query positions. Coordinates are 0-based with end positions exclusive.

PAF Formatting

py_coords_to_paf(matches, query_name, query_len, target_name, target_len) builtin

Python binding: convert coordinate tuples to PAF format strings.

Parameters:

Name Type Description Default
matches list[tuple[int, int, int, int]]

List of (query_start, query_end, target_start, target_end) tuples.

required
query_name str

Name of the query sequence.

required
query_len int

Total length of the query sequence.

required
target_name str

Name of the target sequence.

required
target_len int

Total length of the target sequence.

required

Returns:

Type Description
list[str]

List of PAF format lines (tab-separated).

Index Serialization

py_save_index(path, sequences, k) builtin

Python binding: save an index collection to a file.

Parameters:

Name Type Description Default
path str

Path to save the serialized index.

required
sequences dict[str, str]

Dictionary of sequence name to sequence string.

required
k int

The k-mer length used to build the index.

required

Raises:

Type Description
ValueError

If serialization fails.

py_load_index(path) builtin

Python binding: load an index collection from a file.

Parameters:

Name Type Description Default
path str

Path to the serialized index file.

required

Returns:

Type Description
tuple[dict[str, list[str]], int]

A tuple of (kmer_sets_dict, k) where kmer_sets_dict maps sequence names to their k-mer lists.

Raises:

Type Description
ValueError

If deserialization fails.