Learn R Programming

ENmix (version 1.8.0)

rm.outlier: Filtering out outlier and/or low quality values

Description

Setting outliers as missing value. Outlier was defined as value smaller than 3 times IQR from the lower quartile or larger than 3 times IQR from the upper quartile. If data quality information were provided, low quality data points will be set to missing first before looking for outliers. If specified, imputation will be performed using k-nearest neighbors method to impute all missing values.

Usage

rm.outlier(mat,byrow=TRUE,qcscore=NULL,detPthre=0.000001,nbthre=3, rmcr=FALSE,rthre=0.05,cthre=0.05,impute=FALSE, imputebyrow=TRUE,...)

Arguments

mat
An numeric matirx
byrow
TRUE: Looking for outliers row by row, or FALSE: column by column.
qcscore
If the data quality infomation (the output from function QCinfo) were provied, low quality data points as defined by detection p value threshold (detPthre) or number of bead threshold (nbthre) will be set to missing.
detPthre
Detection P value threshold to define low qualitye data points, detPthre=0.000001 in default.
nbthre
Number of beads threshold define low qualitye data points, nbthre=3 in default.
rmcr
TRUE: excluded rows and columns with too many missing values as defined by rthre and cthre. FALSE is in default
rthre
Minimum of percentage of missing values for a row to be excluded
cthre
Minimum of percentage of missing values for a column to be excluded
impute
Whether to impute missing values. If TRUE, k-nearest neighbors methods will used for imputation. FALSE is in default. Warning: imputed values for multimodal distributed CpGs may not be correct.
imputebyrow
TRUE: impute missing values using similar values in row, or FALSE: in column
...
Arguments to be passed to the function impute.knn in R package "impute"

Value

An numeric matrix of same dimention as the input matrix.

References

Zongli Xu, Liang Niu, Leping Li and Jack A. Taylor, ENmix: a novel background correction method for Illumina HumanMethylation450 BeadChip. Nucleic Acids Research 2015.

Examples

Run this code
if(FALSE){
if (require(minfiData)) {
sheet <- read.450k.sheet(file.path(find.package("minfiData"),"extdata"), pattern = "csv$")
rgSet <- read.450k.exp(targets = sheet,extended = TRUE)
qcscore<-QCinfo(rgSet)
mdat <- preprocessRaw(rgSet)
beta=getBeta(mdat, "Illumina")
#filter out outliers
b1=rm.outlier(beta)
#filter out low quality and outlier values
b2=rm.outlier(beta,qcscore=qcscore)
#filter out low quality and outlier values, remove rows and columns with too many missing values
b3=rm.outlier(beta,qcscore=qcscore,rmcr=TRUE)
#filter out low quality and outlier values, remove rows and columns with too many missing values, and then do imputation
b3=rm.outlier(beta,qcscore=qcscore,rmcr=TRUE,impute=TRUE)
}}

Run the code above in your browser using DataLab