Population Genetics¶
Population genetics statistics and tests.
Overview¶
The popgen module provides population genetics analyses:
- Diversity measures (haplotype, nucleotide)
- Neutrality tests (Tajima's D, Fu's Fs)
- Population differentiation (FST)
- Gene flow estimates
- Site frequency spectrum
Modules¶
pypopart.stats.popgen ¶
Population genetics statistics for PyPopART.
This module provides functions for calculating population genetics measures including Tajima's D, Fu's Fs, FST, and AMOVA.
TajimaDResult
dataclass
¶
FuFsResult
dataclass
¶
FstResult
dataclass
¶
AMOVAResult
dataclass
¶
Result of AMOVA (Analysis of Molecular Variance).
Source code in src/pypopart/stats/popgen.py
calculate_tajimas_d ¶
calculate_tajimas_d(
alignment: Alignment,
populations: Optional[Dict[str, str]] = None,
) -> TajimaDResult
Calculate Tajima's D statistic.
Tajima's D tests for deviation from neutral evolution by comparing two estimates of theta (population mutation rate): - π (nucleotide diversity) - θ_W (Watterson's estimator based on segregating sites)
D = 0: neutral evolution D > 0: balancing selection or population contraction D < 0: directional selection or population expansion
Args: alignment: Alignment object populations: Optional dict mapping sequence_id -> population
Returns:
| Type | Description |
|---|---|
TajimaDResult with D statistic and related values.
|
|
Source code in src/pypopart/stats/popgen.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | |
calculate_fu_fs ¶
Calculate Fu's Fs statistic.
Fu's Fs is based on the probability of observing a random neutral sample with as many or more haplotypes as observed, given theta estimated from nucleotide diversity.
Negative Fs: excess of recent mutations (population expansion) Positive Fs: deficiency of alleles (balancing selection or bottleneck)
Args: network: HaplotypeNetwork object alignment: Alignment object
Returns:
| Type | Description |
|---|---|
FuFsResult with Fs statistic and related values.
|
|
Source code in src/pypopart/stats/popgen.py
calculate_pairwise_fst ¶
Calculate pairwise FST between two populations.
FST measures population differentiation based on genetic variance: FST = (HT - HS) / HT
where: - HT is the expected heterozygosity in the total population - HS is the expected heterozygosity within populations
FST = 0: no differentiation FST = 1: complete differentiation
Args: network: HaplotypeNetwork object pop1: Name of first population pop2: Name of second population
Returns:
| Type | Description |
|---|---|
FstResult with FST and related values.
|
|
Source code in src/pypopart/stats/popgen.py
calculate_fst_matrix ¶
Calculate pairwise FST for all population pairs.
Args: network: HaplotypeNetwork object
Returns:
| Type | Description |
|---|---|
Dictionary mapping (pop1, pop2) tuples to FST values.
|
|
Source code in src/pypopart/stats/popgen.py
calculate_amova ¶
calculate_amova(
network: HaplotypeNetwork,
alignment: Alignment,
groups: Optional[Dict[str, str]] = None,
) -> AMOVAResult
Calculate AMOVA (Analysis of Molecular Variance).
AMOVA partitions genetic variance into components: - Among populations - Within populations - (Optionally) Among groups of populations
Args: network: HaplotypeNetwork object alignment: Alignment object groups: Optional dict mapping population -> group
Returns:
| Type | Description |
|---|---|
AMOVAResult with variance components and phi statistics.
|
|
Source code in src/pypopart/stats/popgen.py
334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 | |
calculate_mismatch_distribution ¶
Calculate the mismatch distribution.
The mismatch distribution shows the frequency of pairwise differences between haplotypes. The shape of this distribution can indicate demographic history: - Unimodal: recent population expansion - Multimodal: population at equilibrium
Args: network: HaplotypeNetwork object
Returns:
| Type | Description |
|---|---|
Dictionary mapping number of differences -> frequency.
|
|