Documentation for SPA_SQR

SPA_SQR is a saddlepoint-approximated, smoothed quantile regression (SQR) framework for genome-wide association studies on quantitative traits. It performs association testing across multiple quantile levels $\tau$ simultaneously, combines them via the Cauchy combination test (CCT), accommodates leave-one-chromosome-out (LOCO) polygenic scores as offsets and an optional sparse GRM for variance calibration, and applies saddlepoint approximation in the tails so that rare variants are well-calibrated even under heavy-tailed or skewed phenotypes.

SPA_SQR is implemented as the SPAsqr method of the GRAB command-line binary — a single statically linked C++17 executable that runs unmodified on Linux, macOS, and Windows.

Why smoothed quantile regression?

Conventional linear GWAS targets the conditional mean of $Y$, which loses power when the genetic effect concentrates in the tails of the phenotype distribution, when $Y$ is non-Gaussian, or when the effect is quantile-dependent (e.g. heteroskedastic / dispersion effects). SPA_SQR targets the conditional quantiles $Q_\tau(Y \mid G, X)$ at a user-specified grid of $\tau$ levels (default ${0.1, 0.3, 0.5, 0.7, 0.9}$) and combines them into a single $p$-value via CCT. Smoothing the check loss with a Gaussian kernel of bandwidth $h$ lowers the variance of the rank score, which translates to a smaller denominator in the score statistic and hence higher power than non-smooth quantile regression.

What SPA_SQR does, in one paragraph

For each chromosome $c$, SPA_SQR fits a null SQR model $Q_\tau(Y \mid X, \hat Y_{-c}) = X^\top \beta_\tau + \hat Y_{-c}$ on the chromosome-specific LOCO PGS as an offset, then computes a variance-stabilized score statistic $S_j = G_j^\top R$ for every variant $j$ on that chromosome. The variance of $S_j$ is $\widehat\sigma_g^{\,2}(G_j)\, R^\top \Phi\, R$ where $\Phi$ is the sparse GRM (or $I_n$ for unrelated samples). A saddlepoint approximation is applied whenever $|S_j|$ exceeds a configurable $z$-threshold so that tail $p$-values stay calibrated for rare or unbalanced traits. Per-$\tau$ $p$-values are combined into a single P_CCT via the Cauchy combination test.

Pipeline

Workflow 1 — LOCO PGS + SPA_SQR. Train chromosome-specific LOCO polygenic scores with LDAK-KVIK or REGENIE, then run grab --method SPAsqr with --pred-list. The recommended path for essentially unrelated cohorts.
Workflow 2 — LOCO PGS + GRM + SPA_SQR. In addition to the LOCO PGS, build a sparse genetic relationship matrix using PLINK 2 (preferred since late 2025) or GCTA, and pass it via --sp-grm-plink2 so that the score-statistic variance is GRM-aware. Recommended whenever the cohort retains first- or second-degree relatives.
(Optional) Effect-size estimation. Re-run with --spasqr-mode wald to obtain per-marker per-$\tau$ $\hat\beta_G$ and SE via M-estimation sandwich variance.

Where to go next

Installation — building the GRAB binary.
Workflow 1: LOCO PGS + SPA_SQR
Workflow 2: LOCO PGS + GRM + SPA_SQR
Running SPA_SQR — the main usage page.
Effect-size estimation — Wald mode.
Strategies for improving statistical power
FAQ

Citation

If you use SPA_SQR in your work, please cite the SPA_SQR manuscript (link to be added on publication).

Documentation for SPASQR