varFilter
removes features exhibiting
little variation across samples. Such non-specific filtering can be
advantageous for downstream data analysis.varFilter(eset, var.func=IQR, var.cutoff=0.5, filterByQuantile=TRUE, ...)
MethyLumiSet
or MethyLumiM
object.filterByQuantile
is TRUE
, features whose
value of var.func
is less than var.cutoff
-quantile of
all var.func
value will be removed. It FALSE
, features
whose values are less than var.cutoff
will be removed.var.cutoff
is to be interprested as a quantile of all var.func
(the
default), or as an absolute value.featureFilter
returns a list consisting of:MethyLumiSet
or MethyLumiM
object.nsFilter
and
varFilter
available from the genefilter
package. See
R. Bourgon et. al. (2010) and nsFilter
for detail. It is proven that non-specific filtering, for which the criteria does
not depend on sample class, can increase the number of discoverie.
Inappropriate choice of test statistics, however, might have adverse
effect. limma
's moderated $t$-statistics, for example, is based on
empirical Bayes approach which models the conjugate prior of
gene-level variance with an inverse of $\chi^2$ distribution scaled
by observed global variance. As the variance-based filtering removes
the set of genes with low variance, the scaled inverse $\chi^2$
no longer provides a good fit to the data passing the filter,
causing the limma
algorithm to produce a posterior
degree-of-freedom of infinity (Bourgon 2010). This leads to two
consequences: (i) gene-level variance estimate will be ignore, and (ii)
the $p$-value will be overly optimistic (Bourgon 2010).
nsFilter
data(mldat)
## keep top 75 percent
filt <- varFilter(mldat, var.cutoff=0.25)
filt$filter.log
dim(filt$eset)
Run the code above in your browser using DataLab