This will use either sva or an SVD on the residuals
of a regression of mat on design_obs to estimate the
surrogate variables.
est_sv(mat, n_sv, design_obs, use_sva = FALSE)A numeric matrix of RNA-seq counts. The rows index the genes and the columns index the samples.
The number of surrogate variables.
A numeric matrix of observed covariates that are NOT to
be a part of the signal generating process. Only used in estimating the
surrogate variables (if target_cor is not NULL).
The intercept should not be included (it will sometimes
produce an error if it is included).
A logical. Should we use surrogate variable analysis
(Leek and Storey, 2008) using design_obs
to estimate the hidden covariates (TRUE)
or should we just do an SVD on log2(mat + 0.5) after
regressing out design_obs (FALSE)? Setting this to
TRUE allows the surrogate variables to be correlated with the
observed covariates, while setting this to FALSE assumes that
the surrogate variables are orthogonal to the observed covariates. This
option only matters if design_obs is not NULL.
Defaults to FALSE.
A matrix of estimated surrogate variables. The columns index the surrogate variables and the rows index the individuals. The surrogate variables are centered and scaled to have mean 0 and variance 1.