Quick Start Guide¶
Get started with PyPopART in minutes! This guide walks you through creating your first haplotype network.
Prerequisites¶
- PyPopART installed (see Installation)
- A sequence alignment file (FASTA, NEXUS, PHYLIP, or GenBank format)
Choosing Your Interface¶
PyPopART offers two ways to work with haplotype networks:
Option 1: Web-based GUI (Recommended for Beginners)¶
Launch the interactive interface:
Open your browser to http://localhost:8050 and follow the visual workflow:
1. Upload sequence file
2. Configure algorithm
3. Compute network
4. Customize layout
5. Export results
Advantages: Visual, interactive, no command-line knowledge needed.
Option 2: Command-Line Interface¶
For automation and scripting, use the CLI as shown below.
Your First Network in 3 Steps (CLI)¶
Step 1: Prepare Your Data¶
Create or use an existing sequence alignment. For this example, we'll create a simple FASTA file:
cat > mysequences.fasta << 'EOF'
>Sample1_PopA
ATCGATCGATCGATCGATCG
>Sample2_PopA
ATCGATCGATCGATCGATCG
>Sample3_PopB
ATCGATCGATCGATTGATCG
>Sample4_PopB
ATCGATCGATCGATTGATCG
>Sample5_PopC
ATCGATCGAGCGATCGATCG
EOF
Step 2: Construct the Network¶
Use the network command to build a haplotype network:
Output:
Loading sequences from mysequences.fasta...
✓ Loaded 5 sequences (20 bp)
Calculating hamming distances...
✓ Distance matrix computed
Identifying unique haplotypes...
✓ Found 3 unique haplotypes
Building MJN network...
✓ Network constructed
Network Statistics:
Nodes: 3
Edges: 2
Median vectors: 0
✓ Network saved as GRAPHML
Step 3: Visualize the Network¶
Create a visualization:
Or create an interactive HTML visualization:
That's it! You've created your first haplotype network.
Exploring Further¶
Try Different Algorithms¶
PyPopART offers six network construction algorithms:
# Minimum Spanning Tree (simplest tree)
pypopart network mysequences.fasta -a mst -o network_mst.graphml
# Minimum Spanning Network (shows alternatives)
pypopart network mysequences.fasta -a msn -o network_msn.graphml
# Statistical Parsimony (TCS)
pypopart network mysequences.fasta -a tcs -o network_tcs.graphml
# Median-Joining (default, infers ancestors)
pypopart network mysequences.fasta -a mjn -o network_mjn.graphml
# Parsimony Network (consensus from multiple trees)
pypopart network mysequences.fasta -a pn -o network_pn.graphml
# Tight Span Walker (metric-preserving, for small datasets)
pypopart network mysequences.fasta -a tsw -o network_tsw.graphml
Algorithm Comparison: - MST: Fastest, simplest tree - MSN: Adds alternative equal-distance connections - TCS: Best for closely related sequences - MJN: Infers ancestral sequences, handles reticulation - PN: Consensus approach, captures uncertainty - TSW: Most accurate metric preservation (slower)
Use Different Distance Metrics¶
# Hamming distance (default, fastest)
pypopart network mysequences.fasta -d hamming -o network.graphml
# Jukes-Cantor correction
pypopart network mysequences.fasta -d jc -o network.graphml
# Kimura 2-parameter
pypopart network mysequences.fasta -d k2p -o network.graphml
# Tamura-Nei (most complex)
pypopart network mysequences.fasta -d tamura_nei -o network.graphml
Analyze Your Network¶
Get comprehensive statistics:
Output:
Loading network from mynetwork.graphml...
✓ Loaded network with 3 nodes
=== Network Statistics ===
nodes: 3
edges: 2
diameter: 2
avg_degree: 1.333
clustering_coefficient: 0.0000
reticulation_index: 0.0000
Customize Visualizations¶
# Different layout algorithms
pypopart visualize mynetwork.graphml -o net.png --layout circular
pypopart visualize mynetwork.graphml -o net.png --layout radial
# Show node labels
pypopart visualize mynetwork.graphml -o net.png --show-labels
# Custom size
pypopart visualize mynetwork.graphml -o net.png --width 1200 --height 900
# Save as PDF (publication-ready)
pypopart visualize mynetwork.graphml -o net.pdf
Using the Python API¶
For more control, use Python directly:
from pypopart.io import load_alignment
from pypopart.core.distance import DistanceCalculator
from pypopart.core.condensation import condense_alignment
from pypopart.algorithms import MJNAlgorithm
from pypopart.visualization import StaticVisualizer
# Load sequences
alignment = load_alignment('mysequences.fasta')
print(f"Loaded {len(alignment)} sequences")
# Calculate distances
calculator = DistanceCalculator(method='k2p')
distances = calculator.calculate_matrix(alignment)
# Identify haplotypes
haplotypes, freq_map = condense_alignment(alignment)
print(f"Found {len(haplotypes)} unique haplotypes")
# Construct network
mjn = MJNAlgorithm(epsilon=0)
network = mjn.construct_network(haplotypes, distances)
# Visualize
viz = StaticVisualizer(network)
viz.plot(
layout_algorithm='spring',
show_labels=True,
output_file='network.png'
)
Common Workflows¶
Workflow 1: Basic Analysis¶
# 1. Load and inspect data
pypopart load mysequences.fasta
# 2. Build network
pypopart network mysequences.fasta -o network.graphml
# 3. Analyze
pypopart analyze network.graphml --stats --topology
# 4. Visualize
pypopart visualize network.graphml -o network.png
Workflow 2: Algorithm Comparison¶
# Try all algorithms
for algo in mst msn tcs mjn pn tsw; do
pypopart network mysequences.fasta -a $algo -o network_${algo}.graphml
pypopart visualize network_${algo}.graphml -o network_${algo}.png
done
Workflow 3: GUI-based Analysis¶
For an interactive workflow:
# Launch GUI
pypopart-gui
# Then use the web interface to:
# 1. Upload sequences and optional metadata
# 2. Try different algorithms with real-time parameter adjustment
# 3. Visualize with multiple layout options
# 4. Drag nodes to reposition
# 5. Export in various formats
Workflow 4: With Metadata¶
If you have population or location data:
# Create metadata file (metadata.csv)
cat > metadata.csv << 'EOF'
id,population,location
Sample1,PopA,Site1
Sample2,PopA,Site1
Sample3,PopB,Site2
Sample4,PopB,Site2
Sample5,PopC,Site3
EOF
# Load with metadata
pypopart load mysequences.fasta -m metadata.csv
# Visualize colored by population
pypopart visualize network.graphml -o network.png --color-by population
Next Steps¶
Now that you've created your first network, learn more:
- CLI Guide: Complete command-line reference
- Algorithms: Choosing the right algorithm
- Tutorials: Detailed examples
- API Reference: Python API documentation
Getting Help¶
- Run
pypopart --helpfor command help - Run
pypopart COMMAND --helpfor specific command help - See FAQ for common questions
- Check GitHub Issues for known problems
Tips¶
Performance
For large datasets (>1000 sequences), consider:
- Using MST or MSN instead of MJN for speed
- Using Hamming distance instead of more complex models
- Increasing epsilon parameter for MJN to reduce complexity
Visualization
- PNG for presentations and documents
- PDF for publications (vector format, scalable)
- SVG for web use
- HTML for interactive exploration
File Formats
- GraphML preserves all network attributes (recommended)
- JSON for web applications
- NEXUS for PopART compatibility
Happy network building! 🌳