Documentation for agricola Python API
agricola exports two functions which can be used in place of the recommended command line workflow:
agricola.step1.step1(datasets, Y, X, phenotypes, train_mask, test_mask, h2_prior, trait_type, loocv=False, B=1000, idx_sample=None, variants=None, level0_dir=None)
Perform agricola step 1
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
datasets
|
list[LancData]
|
A list of LancData objects (either single object or one per-chromosome) |
required |
Y
|
ArrayLike
|
A (N, P) jax array of phenotypes |
required |
X
|
Optional[ArrayLike]
|
A (N, C) jax array of covariates (no intercept) |
required |
phenotypes
|
list[str]
|
A list of phenotype names, ordered as the columns of Y |
required |
train_mask
|
ArrayLike
|
A (N, K) jax array indicating training set status for each set k in 1, ..., K |
required |
test_mask
|
ArrayLike
|
A (N, K) jax array indicating test set status for each set k in 1, ..., K |
required |
h2_prior
|
ArrayLike
|
A 1D jax array of prior values for snp heritability |
required |
trait_type
|
str
|
Either "qt" or "bt" |
required |
loocv
|
bool
|
A boolean indicating whether to perform LOOCV instead of standard cross validation. Ignored for trait_type="qt". |
False
|
B
|
int
|
The number of variants per block |
1000
|
idx_sample
|
Optional[ArrayLike]
|
An optional (N_sub,) jax array with indices of samples to include |
None
|
variants
|
Optional[list[str]]
|
A list of variant IDs to include in the analysis. If not provided, all variants are used |
None
|
level0_dir
|
Optional[str]
|
The directory where level 0 predictions are written |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, DataFrame]
|
A dict where keys are chromosomes and values are (N, P) pandas DataFrames of level 1 predictions |
Source code in src/agricola/step1.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | |
agricola.step2.step2(datasets, Y, X, step1_predictions, out_prefixes, phenotypes, trait_type='qt', test_type='score', chrom=None, B=1000, min_ac=1, idx_sample=None, variants=None, adjust_lanc=True, impute=False)
Perform agricola step 2
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
datasets
|
list[LancData]
|
A list of LancData objects (either single object or one per-chromosome) |
required |
Y
|
ArrayLike
|
A (N, P) jax array of outcomes |
required |
X
|
Optional[ArrayLike]
|
A (N, C) jax array of covariates |
required |
step1_predictions
|
dict[str, DataFrame]
|
A dict with LOCO linear predictions from step 1. The values are (N, P) NumPy arrays |
required |
out_prefixes
|
list[str]
|
A list of prefixes for each dataset. Outputs will be written to {output_prefix}_{phenotype}.parquet |
required |
phenotypes
|
list[str]
|
A list of phenotype names |
required |
trait_type
|
str
|
either "qt" or "bt" |
'qt'
|
test_type
|
str
|
Either "score" or "wald" |
'score'
|
B
|
int
|
The block size (max number of variants to read at once) |
1000
|
min_ac
|
int
|
the minimum allele count threshold |
1
|
idx_sample
|
Optional[ArrayLike]
|
An optional numpy array with ordered indices of samples (in the psam file) to retain |
None
|
variants
|
Optional[list[str]]
|
An optional list of variant IDs to retain |
None
|
adjust_lanc
|
bool
|
A boolean indicating whether to adjust tests for local ancestry |
True
|
impute
|
bool
|
Whether to impute the phenotype. Much faster, but only available for qt traits. If all phenotypes are non-missing, this is ignored. |
False
|
Source code in src/agricola/step2.py
381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 | |