Data_impute: Data_impute

Description

data clean process: detect and remove outlier sample and impute missing value. The process is following: 1. Remove some genes which the number of missing value larger than maxNAratio. 2. Outlier sample detect and remove these sample. 3. Repeat Steps 1-2 untile meet the iteration times or no outlier sample can be detected. 4. impute the missing value. The function also can only do gene filter or remove outlier or impute missing value.

Usage

Data_impute(data, inf = "inf", intensity = "LFQ", miss.value = NA,
            splNExt = TRUE, maxNAratio = 0.5,
            removeOutlier = TRUE,
            outlierdata = "intensity", iteration = NA, sdout = 2,
            distmethod = "manhattan", A.IAC = FALSE,
            dohclust = FALSE, treelabels = NA,
            plot = TRUE, filename = NULL,
            text.labels = NA, abline.col = "red", abline.lwd = 1,
            impute = TRUE, verbose = 1, ...)

Value

a list of proteomic data.

inf: Portein information included protein IDs and other information.
intensity: Quantification informaton.
relative_value: intensity divided by geometric mean
log2_value: log2 of relative_value

Arguments

data: MaxQconvert data or a list Vector which contain two data.frame:ID information and quantification data
inf: the data.frame name contain protein ID information
intensity: the data.frame name only contain quantification data
miss.value: the type of miss.value showed in quantificaiton data. The default value is NA. The miss.value usually can be NA or 0.
splNExt: a logical value whether extract sample name.(suited for MaxQuant quantification data)
maxNAratio: The maximum percent missing data allowed in any row (default 50%).For any rows with more than maxNAratio% missing will deleted.
removeOutlier: a logical value indicated whether remove outlier sample.
outlierdata: The value is deprecated. which data will be used to analysis outlier sample detect.This must be (an abbreviation of) one of the strings "intensity","relative_value","log2_value".
iteration: a numberic value indicating how many times it go through the outlier sample detect and remove loop.NA means do loops until no outlier sample.
sdout: a numberic value indicating the threshold to judge the outlier sample. The default 2 means 0.95 confidence intervals
distmethod: The distance measure to be used. This must be (an abbreviation of) one of the strings "manhattan","euclidean", "canberra","correlation","bicor"
A.IAC: a logical value indicated whether decreasing correlation variance.
dohclust: a logical value indicated whether doing hierarchical clustering and plot dendrograms.
treelabels: labels of dendrograms
plot: a logical value indicated whether plot numbersd scatter diagrams.
filename: the filename of plot. The number and plot type information will added automatically. The default value is NULL which means no file saving. all the plot will be saved to "plot" folder and saved in pdf format.
text.labels: outlier sample annotation (scatter diagrams parameters)
abline.col: the threshold line color (scatter diagrams parameters)
abline.lwd: the threshold line width (scatter diagrams parameters)
impute: a logical value indicated whether do knn imputation.
verbose: integer level of verbosity. Zero means silent, 1 means have some Diagnostic Messages.
...: Other arguments.

Author

Kefu Liu

Details

detect and remove outlier sample and impute missing value.

Examples

Run this code

data(Dforimpute)
data <- Data_impute(Dforimpute,distmethod="manhattan")

Run the code above in your browser using DataLab