varFilter removes features exhibiting
little variation across samples. Such non-specific filtering can be
advantageous for downstream data analysis. varFilter(eset, var.func=IQR, var.cutoff=0.5, filterByQuantile=TRUE, ...)MethyLumiSet or MethyLumiM object.filterByQuantile is TRUE, features whose
value of var.func is less than var.cutoff-quantile of
all var.func value will be removed. It FALSE, features
whose values are less than var.cutoff will be removed.var.cutoff
is to be interprested as a quantile of all var.func (the
default), or as an absolute value.featureFilter returns a list consisting of:
MethyLumiSet or MethyLumiM object.nsFilter and
varFilter available from the genefilter package. See
R. Bourgon et. al. (2010) and nsFilter for detail. It is proven that non-specific filtering, for which the criteria does
not depend on sample class, can increase the number of discoverie.
Inappropriate choice of test statistics, however, might have adverse
effect. limma's moderated $t$-statistics, for example, is based on
empirical Bayes approach which models the conjugate prior of
gene-level variance with an inverse of $\chi^2$ distribution scaled
by observed global variance. As the variance-based filtering removes
the set of genes with low variance, the scaled inverse $\chi^2$
no longer provides a good fit to the data passing the filter,
causing the limma algorithm to produce a posterior
degree-of-freedom of infinity (Bourgon 2010). This leads to two
consequences: (i) gene-level variance estimate will be ignore, and (ii)
the $p$-value will be overly optimistic (Bourgon 2010).
nsFilter data(mldat)
## keep top 75 percent
filt <- varFilter(mldat, var.cutoff=0.25)
filt$filter.log
dim(filt$eset)
Run the code above in your browser using DataLab