Effect-size estimation — --spasqr-mode wald
The default --spasqr-mode score is optimized for genome-wide screening: it tests $H_0!: \beta_G = 0$ against $H_1!: \beta_G \neq 0$ at every $\tau$ using a rank-score statistic computed from a single null-model fit per chromosome. Score mode returns calibrated $p$-values and signed $Z$-scores per marker, but not the effect-size estimate $\hat\beta_G$ itself.
For follow-up analyses — characterizing the magnitude of a top hit, plotting per-quantile effect curves, or feeding effects into a downstream meta-analysis — switch to Wald mode with --spasqr-mode wald. Wald mode fits the full model $Q_\tau(Y \mid X, G_j) = X^\top \beta_\tau + G_j\, \beta_{G,\tau}$ once per (marker, $\tau$) by smoothed M-estimation, and returns
- $\hat\beta_{G,\tau}$ — the per-quantile genetic effect on the transformed $Y$ scale,
- $\widehat{\mathrm{SE}}(\hat\beta_{G,\tau})$ — sandwich-form standard error using the bandwidth-aware Hessian and the empirical score meat,
- $Z_{j,\tau} = \hat\beta_{G,\tau} / \widehat{\mathrm{SE}}$ — Wald statistic,
- $P_{j,\tau}$ — two-sided $p$-value from the standard normal.
When to use Wald mode
| Use case | Recommended mode |
|---|---|
| Genome-wide screening (millions of variants) | score — much faster, gives calibrated $P_\mathrm{CCT}$. |
| Effect-size estimation on top hits (≤ a few hundred variants) | wald — gives $\hat\beta_G$ + SE per $\tau$. |
| Generating summary statistics for meta-analysis | wald restricted via --extract. |
| Sensitivity analyses (alternative bandwidth, tau-specific) | wald — finer control over per-marker fit. |
Wald mode refits the full model per (marker, $\tau$), so total cost scales as $\mathcal O(\text{n_markers} \times \text{n_taus})$ versus score mode’s $\mathcal O(\text{n_markers})$. For a genome-wide scan with 9 tau levels this is roughly 9× slower than score mode — fine for a candidate list, infeasible for the full genome.
Example
Restrict to a set of GWAS hits (one ID per line in hits.txt) and re-run in Wald mode:
./grab --method SPAsqr \
--spasqr-mode wald \
--bfile geno \
--pheno pheno_int.txt --pheno-name Y1 \
--covar covar.txt --covar-name covar1,covar2 \
--pred-list grab_predlist.txt \
--spasqr-taus 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9 \
--extract hits.txt \
--spasqr-h-scale 10 \
--threads 8 \
--out spasqr_wald
This produces spasqr_wald.Y1.SPAsqr with one row per (marker, $\tau$):
CHROM POS ID REF ALT MISS_RATE ALT_FREQ MAC HWE_P TAU BETA SE Z P
Column meanings:
CHROM POS ID REF ALT MISS_RATE ALT_FREQ MAC HWE_P— same as score mode.TAU— the quantile level for this row.BETA— Wald estimate $\hat\beta_{G,\tau}$ on the pheno-transform scale (e.g. INT scale if--pheno-transform int).SE— sandwich-variance standard error.Z— Wald statistic $\hat\beta_{G,\tau} / \widehat{\mathrm{SE}}$.-
P— two-sided $p$-value, $P = 2 \cdot \Phi(-Z )$.
Interpreting per-quantile effects
For a single hit, you typically have 5–9 rows (one per $\tau$). The effect-quantile curve $\tau \mapsto \hat\beta_{G,\tau}$ tells you where in the conditional distribution of $Y$ the variant acts:
- Flat curve — mean-only effect; classical OLS / linear GWAS would have caught it. SPASQR still gives the same hit with slightly conservative SE.
- Monotone in $\tau$ — location-scale effect; the variant shifts both the centre and the tails.
- U-shaped or non-monotone — heteroskedastic / dispersion effect; the variant changes the spread of $Y$ without shifting the centre. These are the hits where SPASQR typically beats classical mean-based GWAS.
Plot $\hat\beta_{G,\tau}$ against $\tau$ with $\pm 1.96 \cdot$ SE error bars; cross-check the sign pattern against the per-$\tau$ $Z$-scores from score mode.
Bandwidth and solver notes
- Wald mode defaults to
--spasqr-h-scale 10(narrower bandwidth than score mode’s 3). This reduces smoothing bias in $\hat\beta_G$ at the cost of slower convergence; it is the right default when you care about the estimate, not just the test. - The same
--spasqr-solverchoice applies (qmmedefault,conqueras alternative).qmmeis the recommended solver for both modes. --spasqr-tolcontrols convergence tolerance for the M-estimation iterations. The default1e-7is usually fine; tighten to1e-9for very rare variants or extreme tail $\tau$.
When NOT to use Wald mode
- For genome-wide screening, always use score mode first. Score mode’s $P_\mathrm{CCT}$ is the calibrated GWAS $p$-value; Wald per-$\tau$ Z-scores are not multiplicity-adjusted.
- For rare-variant testing at extreme $\tau$ (e.g. $\tau \in {0.05, 0.95}$) with sparse data, Wald sandwich SEs can be unstable. Use score mode instead, which leverages SPA in the tails.
Note
- Wald mode honors the same
--pred-list,--sp-grm-plink2,--pheno-transformand--spasqr-tausflags as score mode.- The output schema differs: Wald is one row per (marker, $\tau$) with
TAU BETA SE Z Pcolumns; score is one row per marker withP_CCT P_tau{val}... Z_tau{val}...columns.