Documentation for SPASQR
SPASQR is a saddlepoint-approximated, smoothed quantile regression (SQR) framework for genome-wide association studies on quantitative traits. It performs association testing across multiple quantile levels $\tau$ simultaneously, combines them via the Cauchy combination test (CCT), accommodates leave-one-chromosome-out (LOCO) polygenic scores as offsets and an optional sparse GRM for variance calibration, and applies saddlepoint approximation in the tails so that rare variants are well-calibrated even under heavy-tailed or skewed phenotypes.
SPASQR is implemented as the SPAsqr method of the GRAB command-line binary — a single statically linked C++17 executable that runs unmodified on Linux, macOS, and Windows.
Why smoothed quantile regression?
Conventional linear GWAS targets the conditional mean of $Y$, which loses power when the genetic effect concentrates in the tails of the phenotype distribution, when $Y$ is non-Gaussian, or when the effect is quantile-dependent (e.g. heteroskedastic / dispersion effects). SPASQR targets the conditional quantiles $Q_\tau(Y \mid G, X)$ at a user-specified grid of $\tau$ levels (default ${0.1, 0.3, 0.5, 0.7, 0.9}$) and combines them into a single $p$-value via CCT. Smoothing the check loss with a Gaussian kernel of bandwidth $h$ lowers the variance of the rank score, which translates to a smaller denominator in the score statistic and hence higher power than non-smooth quantile regression.
What SPASQR does, in one paragraph
For each chromosome $c$, SPASQR fits a null SQR model $Q_\tau(Y \mid X, \hat Y_{-c}) = X^\top \beta_\tau + \hat Y_{-c}$ on the chromosome-specific LOCO PGS as an offset, then computes a variance-stabilized score statistic $S_j = G_j^\top R$ for every variant $j$ on that chromosome. The variance of $S_j$ is $\widehat\sigma_g^{\,2}(G_j)\, R^\top \Phi\, R$ where $\Phi$ is the sparse GRM (or $I_n$ for unrelated samples). A saddlepoint approximation is applied whenever $|S_j|$ exceeds a configurable $z$-threshold so that tail $p$-values stay calibrated for rare or unbalanced traits. Per-$\tau$ $p$-values are combined into a single P_CCT via the Cauchy combination test.
Pipeline
-
Workflow 1 — LOCO PGS + SPASQR. Train chromosome-specific LOCO polygenic scores with LDAK-KVIK or REGENIE, then run
grab --method SPAsqrwith--pred-list. The recommended path for essentially unrelated cohorts. -
Workflow 2 — LOCO PGS + GRM + SPASQR. In addition to the LOCO PGS, build a sparse genetic relationship matrix using PLINK 2 (preferred since late 2025) or GCTA, and pass it via
--sp-grm-plink2so that the score-statistic variance is GRM-aware. Recommended whenever the cohort retains first- or second-degree relatives. -
(Optional) Effect-size estimation. Re-run with
--spasqr-mode waldto obtain per-marker per-$\tau$ $\hat\beta_G$ and SE via M-estimation sandwich variance.
Where to go next
- Installation — building the GRAB binary.
- Workflow 1: LOCO PGS + SPASQR
- Workflow 2: LOCO PGS + GRM + SPASQR
- Running SPASQR — the main usage page.
- Effect-size estimation — Wald mode.
- Strategies for improving statistical power
- FAQ
Citation
If you use SPASQR in your work, please cite the SPASQR manuscript (link to be added on publication).