Frequently Asked Questions¶
General¶
What is PyPopART?¶
PyPopART is a pure Python implementation of PopART (Population Analysis with Reticulate Trees), a tool for constructing and visualizing haplotype networks from DNA sequence alignments.
How is PyPopART different from the original PopART?¶
- Pure Python: No Java dependency, easier to install
- CLI and API: Both command-line and programmatic access
- Modern stack: Uses current Python libraries (NetworkX, Plotly)
- Open development: Active development on GitHub
- Additional features: More analysis options and statistics
Is PyPopART compatible with PopART files?¶
Yes! PyPopART can read NEXUS files with traits/metadata in PopART format, and can export networks in formats compatible with PopART.
Installation¶
Which Python version do I need?¶
Python 3.9 or higher. We recommend Python 3.11 or 3.12 for best performance.
Can I use PyPopART on Windows?¶
Yes! PyPopART works on Windows, macOS, and Linux.
Installation fails with NumPy/SciPy errors¶
Try installing NumPy and SciPy first:
On some systems, you may need system dependencies for NumPy/SciPy. See your OS package manager docs.
Usage¶
Which algorithm should I use?¶
- MST: Simplest, always gives a tree
- MSN: Shows alternative connections, slightly more complex
- TCS: Statistically justified, may give disconnected networks
- MJN: Most comprehensive, infers ancestral haplotypes
See the Algorithm Guide for detailed comparison.
Which distance metric should I use?¶
- Hamming: Simple differences, fast, good for closely related sequences
- Jukes-Cantor: Corrects for multiple substitutions
- K2P: Accounts for transition/transversion differences
- Tamura-Nei: Most complex, accounts for GC content and rate variation
For most cases, Hamming or K2P is sufficient.
My network is too complex, how can I simplify it?¶
For MJN networks:
Increase epsilon (e.g., 5, 10, 20) to reduce complexity.
For TCS networks, increase the parsimony limit:
Or use a simpler algorithm (MST or MSN).
My TCS network is disconnected, is this a bug?¶
No, this is expected behavior! TCS only connects haplotypes within the parsimony limit. Disconnected components may represent distinct lineages or insufficient data.
How do I add population/metadata information?¶
Create a CSV file with columns including 'id' and any metadata:
Then:
pypopart load sequences.fasta -m metadata.csv
pypopart visualize network.graphml --color-by population -o network.png
Can I visualize networks from other software?¶
If the network is in GraphML, GML, or JSON format, yes! Load it:
Visualization¶
How do I make publication-quality figures?¶
Use PDF output for vector graphics:
For highest quality:
- Use larger dimensions (1200+ pixels)
- Use vector formats (PDF, SVG)
- Show labels only if needed
- Use spring layout for complex networks
Node sizes don't reflect my sample sizes¶
Node sizes are based on haplotype frequency. Make sure frequency information is preserved in your network file. When constructing from alignment, frequencies are automatically calculated.
How do I customize colors?¶
Currently, colors are assigned automatically. For custom coloring:
from pypopart.visualization import StaticVisualizer
viz = StaticVisualizer(network)
custom_colors = {'Hap1': 'red', 'Hap2': 'blue', ...}
viz.plot(node_colors=custom_colors, output_file='network.png')
Can I export interactive plots?¶
Yes! Use HTML format:
Open the HTML file in any web browser for interactive exploration.
Performance¶
PyPopART is slow with my large dataset¶
For datasets >1000 sequences:
- Use simpler algorithms (MST, MSN instead of MJN)
- Use Hamming distance instead of complex models
- For MJN, increase epsilon to reduce complexity
- Consider subsampling if haplotype diversity is low
Can I parallelize the analysis?¶
Currently, PyPopART runs single-threaded for most operations. Parallel processing may be added in future releases.
Analysis¶
What statistics should I report?¶
Common statistics include:
- Number of haplotypes
- Haplotype diversity
- Nucleotide diversity
- Network diameter
- Number of reticulations
Use:
How do I calculate FST between populations?¶
from pypopart.stats import PopulationGeneticsAnalysis
from pypopart.io import load_alignment
alignment = load_alignment('sequences.fasta')
popgen = PopulationGeneticsAnalysis(alignment)
# Define populations
populations = {
'PopA': ['Seq1', 'Seq2'],
'PopB': ['Seq3', 'Seq4']
}
# Calculate FST
fst = popgen.calculate_pairwise_fst(populations)
print(f"FST = {fst:.4f}")
What does reticulation index mean?¶
Reticulation index measures network complexity:
- 0 = tree (no cycles)
- Higher values = more reticulate (more cycles)
It's calculated as: (edges - nodes + 1) / nodes
Errors¶
"Sequence length doesn't match alignment length"¶
All sequences in an alignment must have the same length. Check your input file for sequences of different lengths.
"Sequence ID already exists"¶
Sequence IDs must be unique. Check for duplicate sequence names in your input file.
"Distance matrix is not square"¶
This usually indicates a bug. Please report it on GitHub with your data and command.
"Cannot import name 'version'"¶
The package wasn't installed correctly. Try:
File Formats¶
What input formats are supported?¶
- FASTA (.fasta, .fa, .fna)
- NEXUS (.nexus, .nex)
- PHYLIP (.phy, .phylip)
- GenBank (.gb, .gbk)
Compressed files (.gz, .zip) are automatically detected and decompressed.
What output formats are available?¶
Networks:
- GraphML (.graphml) - Recommended
- GML (.gml)
- JSON (.json)
- NEXUS (.nexus, .nex)
Visualizations:
- PNG (.png) - Raster
- PDF (.pdf) - Vector, publication-ready
- SVG (.svg) - Vector, web-friendly
- HTML (.html) - Interactive
Can I convert between formats?¶
Yes:
# Load in one format, save in another
pypopart network sequences.fasta -o network.graphml
# Convert to JSON
python -c "
from pypopart.io import load_network, save_network
net = load_network('network.graphml')
save_network(net, 'network.json', format='json')
"
Development¶
How can I contribute?¶
See the Contributing Guide for guidelines.
Where should I report bugs?¶
Open an issue on GitHub with:
- Python and PyPopART versions
- Command or code that caused the error
- Full error message
- Sample data if possible
Can I request features?¶
Yes! Open an issue on GitHub describing:
- The use case
- Proposed functionality
- Examples from other tools (if applicable)
Citation¶
How do I cite PyPopART?¶
Taranto, A. (2024). PyPopART: Pure Python implementation of haplotype network analysis.
GitHub repository: https://github.com/adamtaranto/pypopart
BibTeX:
@software{pypopart,
author = {Taranto, Adam},
title = {PyPopART: Pure Python implementation of haplotype network analysis},
year = {2024},
url = {https://github.com/adamtaranto/pypopart}
}
Still Have Questions?¶
- Check the User Guide
- Read the Tutorials
- Search GitHub Issues
- Open a new issue with the "question" label