Low-Level Functions¶
These functions are implemented in Rust and exposed via PyO3.
They provide fine-grained access to the underlying FM-index and k-mer machinery.
For most use cases, the SequenceIndex class is more convenient.
FASTA I/O¶
py_read_fasta(path)
builtin
¶
Python binding: read sequences from a FASTA or gzipped FASTA file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to the FASTA or FASTA.gz file. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, str]
|
Dictionary mapping sequence name to sequence string. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the file cannot be opened, parsed, or contains duplicate sequence names. |
K-mer Operations¶
py_build_kmer_set(seq, k)
builtin
¶
Python binding: build the set of unique k-mers in a sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seq
|
str
|
The DNA sequence string (uppercase recommended). |
required |
k
|
int
|
The k-mer length. |
required |
Returns:
| Type | Description |
|---|---|
set[str]
|
Set of unique k-mer strings found in the sequence. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If k is 0 or the sequence is empty. |
py_find_kmer_coords(seq, kmers)
builtin
¶
Python binding: find all positions of each k-mer in a sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seq
|
str
|
The DNA sequence to search in. |
required |
kmers
|
list[str]
|
List of k-mer strings to search for. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, list[int]]
|
Dictionary mapping each k-mer to its list of start positions (0-based). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the sequence is invalid or k-mers are inconsistent. |
Merging K-mer Runs¶
rusty-dot provides four merge functions covering all k-mer alignment orientations.
py_merge_runs is the recommended entry-point for new code; the strand-specific
functions are available for lower-level control.
Unified entry-point¶
py_merge_runs(kmer_coords, query_kmer_positions, k, strand)
builtin
¶
Python binding: unified merge for both strand orientations.
A single entry-point for merging k-mer coordinate runs on either the
forward ("+"```) or reverse-complement ("-"`) strand. The function
dispatches to :func:merge_fwd_runsfor the forward strand. For the
reverse strand it applies **both** the anti-diagonal algorithm
(:func:merge_rev_runs) and the co-diagonal algorithm
(:func:merge_rev_fwd_runs`), combining the results and deduplicating
any identical blocks.
For the forward strand, kmer_coords contains the positions of each
k-mer in the target sequence and query_kmer_positions contains the
positions of the same k-mers in the query sequence.
For the reverse strand, kmer_coords should contain the positions
of the reverse complement of each query k-mer in the target (as
returned by find_rev_coords_in_index), and query_kmer_positions
contains the positions of the original k-mers in the query.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kmer_coords
|
dict[str, list[int]]
|
Mapping of k-mer to 0-based target positions. For |
required |
query_kmer_positions
|
dict[str, list[int]]
|
Mapping of k-mer to 0-based query positions. |
required |
k
|
int
|
The k-mer length. |
required |
strand
|
str
|
Orientation of the match: |
required |
Returns:
| Type | Description |
|---|---|
list[tuple[int, int, int, int, str]]
|
List of |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Forward-strand merge¶
py_merge_kmer_runs(kmer_coords, query_kmer_positions, k)
builtin
¶
Python binding: merge sequential k-mer coordinate runs (forward strand).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kmer_coords
|
dict[str, list[int]]
|
Mapping of k-mer to list of target start positions (0-based). |
required |
query_kmer_positions
|
dict[str, list[int]]
|
Mapping of k-mer to list of query start positions (0-based). |
required |
k
|
int
|
The k-mer length. |
required |
Returns:
| Type | Description |
|---|---|
list[tuple[int, int, int, int]]
|
List of (query_start, query_end, target_start, target_end) tuples. Coordinates are 0-based, with end positions exclusive. |
Reverse-complement merges¶
Two complementary algorithms cover all reverse-complement alignment patterns:
| Pattern | When to use |
|---|---|
py_merge_rev_runs |
RC target positions decrease as query advances (query +1, target −1 per step) — standard inverted-repeat alignment where the two arms face each other |
py_merge_rev_fwd_runs |
RC target positions increase as query advances (query +1, target +1 per step) — both repeat arms run in the same left-to-right direction |
py_merge_runs(strand="-") calls both and deduplicates the results automatically.
py_merge_rev_runs(target_rev_coords, query_kmer_positions, k)
builtin
¶
Python binding: merge reverse-complement (- strand) k-mer coordinate runs.
For each k-mer, target_rev_coords should contain the positions in the
target where the reverse complement of that k-mer was found (as returned
by find_rev_coords_in_index). Consecutive anti-diagonal pairs —
where query_pos advances by 1 and the corresponding RC target position
decreases by 1 — are merged into a single CoordPair.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_rev_coords
|
dict[str, list[int]]
|
Mapping of k-mer to the 0-based start positions of its reverse complement in the target sequence. |
required |
query_kmer_positions
|
dict[str, list[int]]
|
Mapping of k-mer to its 0-based start positions in the query sequence. |
required |
k
|
int
|
The k-mer length. |
required |
Returns:
| Type | Description |
|---|---|
list[tuple[int, int, int, int]]
|
List of |
py_merge_rev_fwd_runs(target_rev_coords, query_kmer_positions, k)
builtin
¶
Python binding: merge reverse-complement (- strand) k-mer coordinate runs
that are co-linear on a forward diagonal (RC positions increase as query
position increases).
This is the complement of :func:py_merge_rev_runs, which handles the
anti-diagonal case. Together they cover all possible orientations of
reverse-complement k-mer matches:
py_merge_rev_runs— anti-diagonal:qadvances +1,t_rcdecreases by 1 (standard inverted-repeat / reverse-complement alignment).py_merge_rev_fwd_runs— forward diagonal:qadvances +1,t_rcalso advances +1 (inverted-repeat case where both arms run in the same direction).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_rev_coords
|
dict[str, list[int]]
|
Mapping of k-mer to the 0-based start positions of its reverse complement
in the target sequence (as returned by |
required |
query_kmer_positions
|
dict[str, list[int]]
|
Mapping of k-mer to its 0-based start positions in the query sequence. |
required |
k
|
int
|
The k-mer length. |
required |
Returns:
| Type | Description |
|---|---|
list[tuple[int, int, int, int]]
|
List of |
PAF Formatting¶
py_coords_to_paf(matches, query_name, query_len, target_name, target_len)
builtin
¶
Python binding: convert coordinate tuples to PAF format strings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
matches
|
list[tuple[int, int, int, int]]
|
List of (query_start, query_end, target_start, target_end) tuples. |
required |
query_name
|
str
|
Name of the query sequence. |
required |
query_len
|
int
|
Total length of the query sequence. |
required |
target_name
|
str
|
Name of the target sequence. |
required |
target_len
|
int
|
Total length of the target sequence. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
List of PAF format lines (tab-separated). |
Index Serialization¶
py_save_index(path, sequences, k)
builtin
¶
Python binding: save an index collection to a file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to save the serialized index. |
required |
sequences
|
dict[str, str]
|
Dictionary of sequence name to sequence string. |
required |
k
|
int
|
The k-mer length used to build the index. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If serialization fails. |
py_load_index(path)
builtin
¶
Python binding: load an index collection from a file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to the serialized index file. |
required |
Returns:
| Type | Description |
|---|---|
tuple[dict[str, list[str]], int]
|
A tuple of (kmer_sets_dict, k) where kmer_sets_dict maps sequence names to their k-mer lists. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If deserialization fails. |