Genotype Processing Methods

This page explains how to choose --geno-method from a user perspective. You can use it as a quick decision guide when building GRMs.

What --geno-method Changes

  • The way genotype values are transformed before GRM calculation.

  • Whether values are only centered or also standardized.

  • Whether summary statistics come from sample data or HWE assumptions.

In practice, this affects:

  • Interpretability of additive/dominance effects

  • Numerical scale of the GRM

  • Sensitivity to finite-sample noise

HWE vs Sample: Practical Difference

  • HWE methods use population-genetics expectations, so they are usually easier to interpret from a biological perspective.

  • Sample methods use moments estimated from your data, so standardized variants are better aligned with sample-level properties (mean close to 0, standard deviation close to 1).

  • In iterative model fitting (for example, gelex fit), *-sample methods can sometimes improve numerical precision and speed up convergence.

  • Use one family consistently within the same analysis workflow to avoid method-induced scale differences.

Quick Selection Guide

If you are unsure, use OrthStandardizeHWE (default for GRM and fit). You can also use the short alias OSH.

  • Use OrthStandardizeHWE (alias: OSH) for the default GRM/fit baseline.

  • Use StandardizeHWE (alias: SH) when orthogonal dominance coding is not needed.

  • Use center methods (CenterHWE, OrthCenterHWE, Center, OrthCenter) when only centering is needed.

  • Use orthogonal methods (OrthStandardizeHWE, OrthCenterHWE, OrthStandardize, OrthCenter) when orthogonal dominance coding is required.

  • Use HWE methods (StandardizeHWE, CenterHWE, OrthStandardizeHWE, OrthCenterHWE) when you prefer population-genetics expectations.

  • Use sample methods (Standardize, Center, OrthStandardize, OrthCenter) for data-driven moments.

Method Families

Scaling family:

  • standardize*: center and scale to unit-like variance

  • center*: center only

Encoding family:

  • non-orth: dominant coding [0, 1, 0]

  • orth: dominant coding [0, 2p, 4p-2]

Orthogonal vs Non-orthogonal Dominance

  • orth ([0, 2p, 4p-2]) is designed so additive and dominance parts are orthogonal under the model assumptions.

  • With orth coding, whether you include dominance or not, the additive estimate keeps the interpretation of breeding value.

  • non-orth ([0, 1, 0]) keeps a more direct biological additive interpretation for the additive effect.

  • Choose one coding and keep it fixed across comparable analyses.

Moment family:

  • HWE methods: StandardizeHWE, CenterHWE, OrthStandardizeHWE, OrthCenterHWE (alias: SH, CH, OSH, OCH): HWE-based expected moments

  • Sample methods: Standardize, Center, OrthStandardize, OrthCenter (alias: S, C, OS, OC): moments estimated directly from your sample

Method Matrix (User View)

Method (alias)

Best for

Moments

Scaling

StandardizeHWE (SH)

HWE standardization, no orthogonal dominance

HWE

Standardize

CenterHWE (CH)

HWE centering, no variance scaling

HWE

Center

OrthStandardizeHWE (OSH)

Default GRM/fit: orthogonal dominance + HWE

HWE

Standardize

OrthCenterHWE (OCH)

Default assoc: orthogonal dominance + HWE centering

HWE

Center

Standardize (S)

Sample-based standardization

Sample

Standardize

Center (C)

Sample-based centering, no scaling

Sample

Center

OrthStandardize (OS)

Orthogonal dominance + sample standardization

Sample

Standardize

OrthCenter (OC)

Orthogonal dominance + sample centering

Sample

Center

Practical Recommendations

  • Start with OrthStandardizeHWE (alias: OSH) for most production runs.

  • If biological interpretability is your top priority, prefer HWE methods (StandardizeHWE, CenterHWE, OrthStandardizeHWE, OrthCenterHWE).

  • If optimizer stability and convergence speed are your top priority, test sample methods (Standardize, Center, OrthStandardize, OrthCenter) first.

  • If comparing with older centered pipelines, use CenterHWE (alias: CH).

  • Use sample methods (Standardize, Center, OrthStandardize, OrthCenter) only when you intentionally want sample-dependent centering and variance.

  • Keep method choice fixed across comparable runs to avoid scale mismatch.

Minimal Technical Notes

  • Missing genotypes are handled automatically.

  • Variants with near-zero variance are treated as monomorphic and safely skipped from unstable scaling.

  • For frequency-based calculations, Gelex clamps estimated frequency into [0, 1] for numerical stability.

Example Commands

# Recommended default (OrthStandardizeHWE)
gelex grm -b genotypes --geno-method OSH -o grm_orth_hwe
# HWE centering, no scaling
gelex grm -b genotypes --geno-method CH -o grm_center_hwe
# Orthogonal dominance with sample moments
gelex grm -b genotypes --dom --geno-method OS -o grm_orth_sample

See Also