Documentation for SPASQR

SPASQR is a saddlepoint-approximated, smoothed quantile regression (SQR) framework for genome-wide association studies on quantitative traits. It performs association testing across multiple quantile levels $\tau$ simultaneously, combines them via the Cauchy combination test (CCT), accommodates leave-one-chromosome-out (LOCO) polygenic scores as offsets and an optional sparse GRM for variance calibration, and applies saddlepoint approximation in the tails so that rare variants are well-calibrated even under heavy-tailed or skewed phenotypes.

SPASQR is implemented as the SPAsqr method of the GRAB command-line binary — a single statically linked C++17 executable that runs unmodified on Linux, macOS, and Windows.

Why smoothed quantile regression?

Conventional linear GWAS targets the conditional mean of $Y$, which loses power when the genetic effect concentrates in the tails of the phenotype distribution, when $Y$ is non-Gaussian, or when the effect is quantile-dependent (e.g. heteroskedastic / dispersion effects). SPASQR targets the conditional quantiles $Q_\tau(Y \mid G, X)$ at a user-specified grid of $\tau$ levels (default ${0.1, 0.3, 0.5, 0.7, 0.9}$) and combines them into a single $p$-value via CCT. Smoothing the check loss with a Gaussian kernel of bandwidth $h$ lowers the variance of the rank score, which translates to a smaller denominator in the score statistic and hence higher power than non-smooth quantile regression.

What SPASQR does, in one paragraph

For each chromosome $c$, SPASQR fits a null SQR model $Q_\tau(Y \mid X, \hat Y_{-c}) = X^\top \beta_\tau + \hat Y_{-c}$ on the chromosome-specific LOCO PGS as an offset, then computes a variance-stabilized score statistic $S_j = G_j^\top R$ for every variant $j$ on that chromosome. The variance of $S_j$ is $\widehat\sigma_g^{\,2}(G_j)\, R^\top \Phi\, R$ where $\Phi$ is the sparse GRM (or $I_n$ for unrelated samples). A saddlepoint approximation is applied whenever $|S_j|$ exceeds a configurable $z$-threshold so that tail $p$-values stay calibrated for rare or unbalanced traits. Per-$\tau$ $p$-values are combined into a single P_CCT via the Cauchy combination test.

Pipeline

  1. Workflow 1 — LOCO PGS + SPASQR. Train chromosome-specific LOCO polygenic scores with LDAK-KVIK or REGENIE, then run grab --method SPAsqr with --pred-list. The recommended path for essentially unrelated cohorts.

  2. Workflow 2 — LOCO PGS + GRM + SPASQR. In addition to the LOCO PGS, build a sparse genetic relationship matrix using PLINK 2 (preferred since late 2025) or GCTA, and pass it via --sp-grm-plink2 so that the score-statistic variance is GRM-aware. Recommended whenever the cohort retains first- or second-degree relatives.

  3. (Optional) Effect-size estimation. Re-run with --spasqr-mode wald to obtain per-marker per-$\tau$ $\hat\beta_G$ and SE via M-estimation sandwich variance.

Where to go next

Citation

If you use SPASQR in your work, please cite the SPASQR manuscript (link to be added on publication).


This site uses Just the Docs, a documentation theme for Jekyll.