In some applications, particularly genetic applications,
it is desired to fit a regression model (\(Y = Xb + E\) say,
which we refer to as "the original regression model" or ORM)
without access to the actual values of \(Y\) and \(X\), but
given only some summary statistics. susie_rss
assumes
availability of z-scores from standard univariate regression of
\(Y\) on each column of \(X\), and an estimate, \(R\), of the
correlation matrix for the columns of \(X\) (in genetic
applications \(R\) is sometimes called the “LD matrix”).
With the inputs z
, R
and sample size n
,
susie_rss
computes PVE-adjusted z-scores z_tilde
, and
calls susie_suff_stat
with XtX = (n-1)R
, Xty =
\(\sqrt{n-1} z_tilde\), yty = n-1
, n = n
. The
output effect estimates are on the scale of \(b\) in the ORM with
standardized \(X\) and \(y\). When the LD matrix
R
and the z-scores z
are computed using the same
matrix \(X\), the results from susie_rss
are same as, or
very similar to, susie
with standardized \(X\) and
\(y\).
Alternatively, if the user provides n
, bhat
(the
univariate OLS estimates from regressing \(y\) on each column of
\(X\)), shat
(the standard errors from these OLS
regressions), the in-sample correlation matrix \(R =
cov2cor(crossprod(X))\), and the variance of \(y\), the results
from susie_rss
are same as susie
with \(X\) and
\(y\). The effect estimates are on the same scale as the
coefficients \(b\) in the ORM with \(X\) and \(y\).
In rare cases in which the sample size, \(n\), is unknown,
susie_rss
calls susie_suff_stat
with XtX = R
and Xty = z
, and with residual_variance = 1
. The
underlying assumption of performing the analysis in this way is
that the sample size is large (i.e., infinity), and/or the
effects are small. More formally, this combines the log-likelihood
for the noncentrality parameters, \(\tilde{b} = \sqrt{n} b\),
$$L(\tilde{b}; z, R) = -(\tilde{b}'R\tilde{b} -
2z'\tilde{b})/2,$$ with the “susie prior” on
\(\tilde{b}\); see susie
and Wang et al
(2020) for details. In this case, the effect estimates returned by
susie_rss
are on the noncentrality parameter scale.
The estimate_residual_variance
setting is FALSE
by
default, which is recommended when the LD matrix is estimated from
a reference panel. When the LD matrix R
and the summary
statistics z
(or bhat
, shat
) are computed
using the same matrix \(X\), we recommend setting
estimate_residual_variance = TRUE
.