Learn R Programming

TSCAN (version 1.10.2)

preprocess: preprocess

Description

preprocess the raw single-cell data

Usage

preprocess(data, clusternum = NULL, takelog = TRUE, logbase = 2, pseudocount = 1, minexpr_value = 1, minexpr_percent = 0.5, cvcutoff = 1)

Arguments

data
The raw single_cell data, which is a numeric matrix or data.frame. Rows represent genes/features and columns represent single cells.
clusternum
The number of clusters for doing cluster, typically 5 percent of number of all genes. The clustering will be done after all the transformation and trimming. If NULL no clustering will be performed.
takelog
Logical value indicating whether to take logarithm
logbase
Numeric value specifiying base of logarithm
pseudocount
Numeric value to be added to the raw data when taking logarithm
minexpr_value
Numeric value specifying the minimum cutoff of log transformed (if takelog is TRUE) value
minexpr_percent
Numeric value specifying the lowest percentage of highly expressed cells (expression value bigger than minexpr_value) for the genes/features to be retained.
cvcutoff
Numeric value specifying the minimum value of coefficient of variance for the genes/features to be retained.

Value

Matrix or data frame with the same format as the input dataset.

Details

This function first takes logarithm of the raw data and then filters out genes/features in which too many cells are low expressed. It also filters out genes/features with low coefficient of variance which indicates the genes/features does not contain much information. The default setting will first take log2 of the raw data after adding a pseudocount of 1. Then genes/features in which at least half of cells have expression values are greater than 1 and the coefficeints of variance across all cells are at least 1 are retained.

Examples

Run this code
data(lpsdata)
procdata <- preprocess(lpsdata)

Run the code above in your browser using DataLab