This function evaluates the quality of cells detected in single-cell RNA-seq data by calculating a zeta score for each cell. The zeta score is based on the distribution of gene expression across different expression thresholds. A cutoff value is automatically determined using a two-component Gaussian mixture model to separate high-quality cells from low-quality or damaged cells.
ZetaSuitSC(countMatSC, binNum = 10, filter = TRUE)A list containing:
A data frame with two columns: 'Cell' (cell identifiers) and 'Zeta' (calculated zeta scores)
A ggplot object showing the distribution of log10-transformed zeta scores with fitted Gaussian mixture components and the determined cutoff threshold
A matrix of single-cell RNA-seq count data where rows represent cells and columns represent genes.
The number of bins for zeta score calculation. Default is 10. The function creates expression thresholds from 0 to the 80th percentile of non-zero expression values, divided into binNum intervals.
Logical. Whether to filter out cells with total read counts less than 100. Default is TRUE. This helps remove extremely low-quality cells before analysis.
Yajing Hao, Shuyang Zhang, Junhui Li, Guofeng Zhao, Xiang-Dong Fu
The function works as follows:
Filters cells based on total read count if filter=TRUE
Samples a subset of cells and genes for computational efficiency
Creates expression thresholds (bins) from 0 to the 80th percentile of non-zero expression values
For each cell, counts how many genes exceed each threshold
Calculates the zeta score as a weighted sum of these counts
Fits a two-component Gaussian mixture model to log10-transformed zeta scores
Determines an optimal cutoff to separate high-quality from low-quality cells
data(countMatSC)
zetaDataSC <- ZetaSuitSC(countMatSC, binNum=50, filter=TRUE)
Run the code above in your browser using DataLab