cellTypeAssignSCRNA: `cellTypeAssignSCRNA` assigns CDSeq-identified cell types using single cell RNAseq data.

Description

cellTypeAssignSCRNA assigns CDSeq-identified cell types using single cell RNAseq data.

Usage

cellTypeAssignSCRNA(
  cdseq_gep = NULL,
  cdseq_prop = NULL,
  cdseq_gep_sample_specific = NULL,
  sc_gep = NULL,
  sc_annotation = NULL,
  nb_size = NULL,
  nb_mu = NULL,
  seurat_count_threshold = 100,
  seurat_scale_factor = 10000,
  seurat_norm_method = "LogNormalize",
  seurat_select_method = "vst",
  seurat_nfeatures = 100,
  seurat_npcs = 50,
  seurat_dims = 1:10,
  seurat_reduction = "pca",
  seurat_resolution = 0.8,
  seurat_find_marker = FALSE,
  seurat_DE_test = "wilcox",
  seurat_DE_logfc = 0.25,
  seurat_top_n_markers = 10,
  sc_pt_size = 1,
  cdseq_pt_size = 3,
  plot_umap = 1,
  plot_tsne = 1,
  plot_per_sample = 0,
  fig_save = 0,
  fig_path = getwd(),
  fig_name = "cellTypeAssignSCRNA",
  fig_format = "pdf",
  fig_dpi = 300,
  verbose = FALSE
)

Arguments

cdseq_gep

CDSeq-estimated gene expression profile matrix with G rows (genes) and T columns (cell types).

cdseq_prop

CDSeq-estimated sample-specific cell-type proportion, a matrix with T rows (cell type) and M (sample size).

cdseq_gep_sample_specific

CDSeq-estimated sample-specific cell type gene expression, in the form of read counts. It is a 3 dimension array, i.e. gene by sample by cell type. The element cdseq_gep_sample_specific[i,j,k] represents the reads mapped to gene i from cell type k in sample j.

sc_gep

a G (genes) by N (cell) matrix or dataframe that contains the gene expression profile for N single cells.

sc_annotation

a dataframe contains two columns "cell_id" and "cell_type". cell_id needs to match with the cell_id in sc_gep but not required to have the same size. cell_type is the cell type annotation for the single cells.

nb_size

size parameter for negative binomial distribution, check rnbinom for details.

nb_mu

mu parameter for negative binomial distribution, check rnbinom for details.

seurat_count_threshold

this parameter will be passed to Seurat subset function (subset = nCount_RNA > seurat_count_threshold) for filtering out single cells whose total counts is less this threshold.

seurat_scale_factor

this parameter will be passed to scale.factor in Seurat function NormalizeData.

seurat_norm_method

this parameter will be passed to normalization.method in Seurat function NormalizeData.

seurat_select_method

this parameter will be passed to selection.method in Seurat function FindVariableFeatures

seurat_nfeatures

this parameter will be passed to nfeatures in Seurat function FindVariableFeatures.

seurat_npcs

this parameter will be passed to npcs in Seurat function RunPCA.

seurat_dims

this parameter will be passed to dims in Seurat function FindNeighbors.

seurat_reduction

this parameter will be passed to reduction in Seurat function FindNeighbors.

seurat_resolution

this parameter will be passed to resolution in Seurat function FindClusters.

seurat_find_marker

this parameter controls if run seurat FindMarker function, default is FALSE.

seurat_DE_test

this parameter will be passed to test.use in Seurat function FindAllMarkers.

seurat_DE_logfc

this parameter will be passed to logfc.threshold in Seurat function FindAllMarkers.

seurat_top_n_markers

the number of top DE markers saved from Seurat output.

sc_pt_size

point size of single cell data in umap and tsne plots

cdseq_pt_size

point size of CDSeq-estimated cell types in umap and tsne plots

plot_umap

set 1 to plot umap figure of scRNAseq and CDSeq-estimated cell types, 0 otherwise.

plot_tsne

set 1 to plot tsne figure of scRNAseq and CDSeq-estimated cell types, 0 otherwise.

plot_per_sample

currently disabled for debugging

fig_save

1 or 0. 1 means save figures to local and 0 means do not save figures to local.

fig_path

the location where the heatmap figure is saved.

fig_name

the name of umap and tsne figures. Umap figure will have the name of fig_name_umap_date and tsne figure will be named fig_name_tsne_date.

fig_format

"pdf", "jpeg", or "png".

fig_dpi

figure dpi

verbose

if TRUE, some calculation information will be print.

Value

cellTypeAssignSCRNA returns a list containing following fields: fig_path: same as the input fig_path

fig_name: same as the input fig_name

cdseq_synth_scRNA: synthetic scRNAseq data generated using CDSeq-estiamted GEPs

cdseq_scRNA_umap: ggplot figure of the umap outcome

cdseq_scRNA_tsne: ggplot figure of the tsne outcome

cdseq_synth_scRNA_seurat: Seurat object containing the scRNAseq combined with CDSeq-estimated cell types. Cell id for CDSeq-estimated cell types start with "CDSeq".

seurat_cluster_purity: for all cells in a Seurat cluster i, the ith value in seurat_cluster_purity is the proportion of the mostly repeated cell annotation from sc_annotation. For example, after Seurat clustering, suppose there are 100 cells in cluster 1, out of these 100 cells, 90 cells' annotation in sc_annotation is cell type A, then the fist value in seurat_cluster_purity is 0.9. This output can be used to assess the agreement between Seurat clustering and the given sc_annotation.

seurat_unique_clusters: Unique Seurat cluster numbering. This can be used together with seurat_cluster_gold_label to match the Seurat clusters with given annotations.

seurat_cluster_gold_label: The cell type annotations for each unique Seurat cluster based on sc_annotation.

seurat_markers: DE genes for each Seurat cluster.

seurat_top_markers: Top seurat_top_n_markers DE genes for each Seurat cluster.

CDSeq_cell_type_assignment_df: cell type assignment for CDSeq-estimated cell types.

cdseq_prop_merged: CDSeq-estimated cell type proportions with cell type annotations.

cdseq_gep_sample_specific_merged: sample-specific cell-type read counts. It is a 3d array with dimensions: gene, sample, cell type.

input_list: values for input parameters

cdseq_sc_comb_umap_df: dataframe for umap plot

cdseq_sc_comb_tsne_df: dataframe for tsne plot