preprocess: preprocess

Description

preprocess the raw single-cell data

Usage

preprocess(data, clusternum = NULL, takelog = TRUE, logbase = 2, pseudocount = 1, minexpr_value = 1, minexpr_percent = 0.5, cvcutoff = 1)

Arguments

data

The raw single_cell data, which is a numeric matrix or data.frame. Rows represent genes/features and columns represent single cells.

clusternum

The number of clusters for doing cluster, typically 5 percent of number of all genes. The clustering will be done after all the transformation and trimming. If NULL no clustering will be performed.

takelog

Logical value indicating whether to take logarithm

logbase

Numeric value specifiying base of logarithm

pseudocount

Numeric value to be added to the raw data when taking logarithm

minexpr_value

Numeric value specifying the minimum cutoff of log transformed (if takelog is TRUE) value

minexpr_percent

Numeric value specifying the lowest percentage of highly expressed cells (expression value bigger than minexpr_value) for the genes/features to be retained.

cvcutoff

Numeric value specifying the minimum value of coefficient of variance for the genes/features to be retained.

Value

Matrix or data frame with the same format as the input dataset.

Details

This function first takes logarithm of the raw data and then filters out genes/features in which too many cells are low expressed. It also filters out genes/features with low coefficient of variance which indicates the genes/features does not contain much information. The default setting will first take log2 of the raw data after adding a pseudocount of 1. Then genes/features in which at least half of cells have expression values are greater than 1 and the coefficeints of variance across all cells are at least 1 are retained.

Examples

Run this code

data(lpsdata)
procdata <- preprocess(lpsdata)

Run the code above in your browser using DataLab