fit¶
Train SNP effect models for genomic prediction using BayesAlphabet methods.
Use this command when you want to learn marker effects from training data, then reuse those effects in predict.
Basic Syntax¶
gelex fit -b train_data -p phenotypes.tsv -m RR -o model_rr
gelex fit --pheno <pheno_file> --bfile <genotype_prefix> --method <method> [OPTIONS]
Required inputs are phenotype file (--pheno), genotype prefix (--bfile),
and model method (--method).
Method Selection¶
Choose a method based on your goal before tuning other parameters.
Method |
Use when |
Trade-off |
|---|---|---|
|
All SNPs are assumed to have non-zero effects; use as a baseline. |
Stable and simple, but weak variable selection. |
|
You expect a mixture of effect sizes and want flexible shrinkage. |
Better accuracy in many traits, with moderate runtime. |
|
You expect many near-zero SNP effects and want explicit variable selection. |
Stronger sparsity, but more sensitive to prior settings. |
|
You want all SNPs included with SNP-specific shrinkage. |
More MCMC sampling cost than |
|
You want to model dominance effects alongside additive effects. |
More parameters and longer runtime. |
|
You want the model to estimate mixture proportions from the data. |
More adaptive, but may require longer chains for stable estimates. |
If you are unsure, start with RR to establish a baseline, then try
R as a stronger default for production runs.
Options¶
Quick Start Options
-p, --phenorequiredPhenotype TSV file (
FID IID trait1 ...).-b, --bfilerequiredPLINK binary prefix (
.bed/.bim/.fam).-m, --methodRRModeling method. Start with
RR(baseline) orR(accuracy-oriented).-o, --outgelexOutput prefix for generated files.
Input Options
-p, --phenorequiredPhenotype TSV file in format
FID IID trait1 ....--pheno-col20-based trait column index in the phenotype file.
-b, --bfilerequiredPLINK binary prefix (
.bed/.bim/.fam).--qcovarQuantitative covariate TSV in format
FID IID covar1 ....--dcovarCategorical covariate TSV in format
FID IID factor1 ....
Model Options
-m, --methodRRBayesAlphabet method. Supported:
A/B/C/R/RR; adddfor dominance (for exampleRd); addpito estimate mixture proportions (for exampleCpi).--geno-methodOrthStandardizeHWEGenotype processing method. Available methods:
StandardizeHWE(SH),CenterHWE(CH),OrthStandardizeHWE(OSH),OrthCenterHWE(OCH),Standardize(S),Center(C),OrthStandardize(OS),OrthCenter(OC). Abbreviations accepted. See Genotype Processing Methods.--scale0 0.001 0.01 0.1 1Additive variance scales, typically used in BayesR-style models.
--pi0.99 0.01Additive mixture proportions. For BayesR, default is
0.99 0.005 0.003 0.001 0.001.--dscale0 0.001 0.01 0.1 1Dominance variance scales for dominance-enabled models.
--dpi0.99 0.01Dominance mixture proportions. For BayesR dominance models, default is
0.99 0.005 0.003 0.001 0.001.
MCMC Options
--iters3000Total MCMC iterations.
--burnin2000Initial iterations discarded before sampling.
--thin1Keep one sample every
thiniterations.--seed42Random seed for reproducible MCMC.
Performance and Output
-c, --chunk-size10000Number of SNPs per processing chunk. Lower values reduce peak memory.
-t, --threads12Number of CPU threads to use.
--mmapfalseEnable memory-mapped I/O. Usually lowers RAM pressure and may reduce speed.
-o, --outgelexOutput prefix for all generated files.
Output Files¶
After a successful run, check files with your output prefix first.
File pattern |
Contents |
Typical next step |
|---|---|---|
|
Estimated SNP effects |
Use with |
|
Estimated fixed/covariate effects and model parameters |
Optional input for |
|
Run logs and model-specific artifacts |
Review convergence and configuration used |
Warnings and Notes¶
Note
For many datasets, a practical starting point is --burnin around
20%-50% of --iters. Increase --iters when posterior summaries are
unstable across runs.
Note
If memory is limited, reduce --chunk-size first, then enable
--mmap. This usually lowers RAM usage with a possible runtime penalty.
Examples¶
gelex fit \
-b train_data \
-p phenotypes.tsv \
-m RR \
-o model_rr
Expected outputs: model_rr.snp.eff, model_rr.param.
gelex fit \
-b train_data \
-p phenotypes.tsv \
-m R \
-o model_bayesr
Expected outputs: model_bayesr.snp.eff, model_bayesr.param.
gelex fit \
-b train_data \
-p phenotypes.tsv \
-m B \
--pi 0.99 0.01 \
-o model_bayesb
gelex fit \
-b train_data \
-p phenotypes.tsv \
-m R \
--dcovar sex.tsv \
--qcovar age.tsv \
-o model_covar
gelex fit \
-b train_data \
-p phenotypes.tsv \
-m R \
--iters 50000 \
--burnin 10000 \
--thin 5 \
-o model_high_prec
gelex fit \
-b train_data \
-p phenotypes.tsv \
-m Rd \
--dscale 0.0001 0.001 0.01 0.1 1.0 \
--dpi 0.95 0.05 \
-o model_dom
gelex fit \
-b train_data \
-p phenotypes.tsv \
-m Cpi \
--pi 0.9 0.1 \
--scale 0.01 0.1 \
-o model_cpi