cellTypeAssignSCRNA
assigns CDSeq-identified cell types using single cell RNAseq data.cellTypeAssignSCRNA
assigns CDSeq-identified cell types using single cell RNAseq data.
cellTypeAssignSCRNA(
cdseq_gep = NULL,
cdseq_prop = NULL,
cdseq_gep_sample_specific = NULL,
sc_gep = NULL,
sc_annotation = NULL,
nb_size = NULL,
nb_mu = NULL,
seurat_count_threshold = 100,
seurat_scale_factor = 10000,
seurat_norm_method = "LogNormalize",
seurat_select_method = "vst",
seurat_nfeatures = 100,
seurat_npcs = 50,
seurat_dims = 1:10,
seurat_reduction = "pca",
seurat_resolution = 0.8,
seurat_find_marker = FALSE,
seurat_DE_test = "wilcox",
seurat_DE_logfc = 0.25,
seurat_top_n_markers = 10,
sc_pt_size = 1,
cdseq_pt_size = 3,
plot_umap = 1,
plot_tsne = 1,
plot_per_sample = 0,
fig_save = 0,
fig_path = getwd(),
fig_name = "cellTypeAssignSCRNA",
fig_format = "pdf",
fig_dpi = 300,
verbose = FALSE
)
CDSeq-estimated gene expression profile matrix with G rows (genes) and T columns (cell types).
CDSeq-estimated sample-specific cell-type proportion, a matrix with T rows (cell type) and M (sample size).
CDSeq-estimated sample-specific cell type gene expression, in the form of read counts. It is a 3 dimension array, i.e. gene by sample by cell type. The element cdseq_gep_sample_specific[i,j,k] represents the reads mapped to gene i from cell type k in sample j.
a G (genes) by N (cell) matrix or dataframe that contains the gene expression profile for N single cells.
a dataframe contains two columns "cell_id" and "cell_type". cell_id needs to match with the cell_id in sc_gep but not required to have the same size. cell_type is the cell type annotation for the single cells.
size parameter for negative binomial distribution, check rnbinom for details.
mu parameter for negative binomial distribution, check rnbinom for details.
this parameter will be passed to Seurat subset function (subset = nCount_RNA > seurat_count_threshold) for filtering out single cells whose total counts is less this threshold.
this parameter will be passed to scale.factor in Seurat function NormalizeData.
this parameter will be passed to normalization.method in Seurat function NormalizeData.
this parameter will be passed to selection.method in Seurat function FindVariableFeatures
this parameter will be passed to nfeatures in Seurat function FindVariableFeatures.
this parameter will be passed to npcs in Seurat function RunPCA.
this parameter will be passed to dims in Seurat function FindNeighbors.
this parameter will be passed to reduction in Seurat function FindNeighbors.
this parameter will be passed to resolution in Seurat function FindClusters.
this parameter controls if run seurat FindMarker function, default is FALSE.
this parameter will be passed to test.use in Seurat function FindAllMarkers.
this parameter will be passed to logfc.threshold in Seurat function FindAllMarkers.
the number of top DE markers saved from Seurat output.
point size of single cell data in umap and tsne plots
point size of CDSeq-estimated cell types in umap and tsne plots
set 1 to plot umap figure of scRNAseq and CDSeq-estimated cell types, 0 otherwise.
set 1 to plot tsne figure of scRNAseq and CDSeq-estimated cell types, 0 otherwise.
currently disabled for debugging
1 or 0. 1 means save figures to local and 0 means do not save figures to local.
the location where the heatmap figure is saved.
the name of umap and tsne figures. Umap figure will have the name of fig_name_umap_date and tsne figure will be named fig_name_tsne_date.
"pdf", "jpeg", or "png".
figure dpi
if TRUE, some calculation information will be print.
cellTypeAssignSCRNA returns a list containing following fields: fig_path: same as the input fig_path
fig_name: same as the input fig_name
cdseq_synth_scRNA: synthetic scRNAseq data generated using CDSeq-estiamted GEPs
cdseq_scRNA_umap: ggplot figure of the umap outcome
cdseq_scRNA_tsne: ggplot figure of the tsne outcome
cdseq_synth_scRNA_seurat: Seurat object containing the scRNAseq combined with CDSeq-estimated cell types. Cell id for CDSeq-estimated cell types start with "CDSeq".
seurat_cluster_purity: for all cells in a Seurat cluster i, the ith value in seurat_cluster_purity is the proportion of the mostly repeated cell annotation from sc_annotation. For example, after Seurat clustering, suppose there are 100 cells in cluster 1, out of these 100 cells, 90 cells' annotation in sc_annotation is cell type A, then the fist value in seurat_cluster_purity is 0.9. This output can be used to assess the agreement between Seurat clustering and the given sc_annotation.
seurat_unique_clusters: Unique Seurat cluster numbering. This can be used together with seurat_cluster_gold_label to match the Seurat clusters with given annotations.
seurat_cluster_gold_label: The cell type annotations for each unique Seurat cluster based on sc_annotation.
seurat_markers: DE genes for each Seurat cluster.
seurat_top_markers: Top seurat_top_n_markers DE genes for each Seurat cluster.
CDSeq_cell_type_assignment_df: cell type assignment for CDSeq-estimated cell types.
cdseq_prop_merged: CDSeq-estimated cell type proportions with cell type annotations.
cdseq_gep_sample_specific_merged: sample-specific cell-type read counts. It is a 3d array with dimensions: gene, sample, cell type.
input_list: values for input parameters
cdseq_sc_comb_umap_df: dataframe for umap plot
cdseq_sc_comb_tsne_df: dataframe for tsne plot