RNASeq.Data: Standardize RNASeq Data for Clustering

Description

RNASeq.Data is used to collect RNA-Seq data that need to be clustered.

Usage

RNASeq.Data(Count, Normalizer=NULL, Treatment,GeneID=NULL)

Arguments

Count

a GxP matrix storing the numbers of reads mapped to G genes in P samples. Non-integer values are allowed.

Normalizer

a vector of length P or a GxP matrix to normalize the gene expressions. When Normalizer=NULL, we use log(Q2) by default, where Q3 is the 75

Treatment

a vector of length P indicating the assignment of treatments for each column of the Count. For example, Treatment=c(1,1,2,2,3,3) means there are 3 treatments with each having 2 replicates

GeneID

the ID's of the genes, labeled by 1,2,...,G if not provided

Value

GeneID: ID's of genes provided by the user. Default is 1,2,...,G if not provided
Treatment: The same as the input, but is sorted in increasing order.
Count: The matrix of counts of reads as provided. The columns of the matrix is re-arranged to match the ordered labels of treatment
Normalizer: A matrix contains the input normalization factors as provided or from default setting. If the provided value is a vector, then each column of the matrix will have the same value
logFC: A matrix contains the log fold change (log-FC) of the normalized genes expressions across all the treatments. Each row of the log-FC matrix is standardized to has zero sum
Aver.Expr: the logarithm of the mean gene expression after normalization
logFC: a matrix storing the gene profiles, which is defined as the log fold changes relative to the mean gene expression
NB.Dispersion: the estimated gene-wise dispersion if assuming NB model

Examples

Run this code

###### run the following codes in order
#
# data("Count")     ## a sample data set with RNA-seq expressions 
#                   ## for 1000 genes, 4 treatment and 2 replicates
# head(Count)
# GeneID=1:nrow(Count)
# Normalizer=rep(1,ncol(Count))
# Treatment=rep(1:4,2)
# mydata=RNASeq.Data(Count,Normalize=NULL,Treatment,GeneID) 
#                   ## standardized RNA-seq data
# c0=KmeansPlus.RNASeq(mydata,nK=10)$centers
#                   ## choose 10 cluster centers to initialize the clustering 
# cls=Cluster.RNASeq(data=mydata,model="nbinom",centers=c0,method="EM")$cluster
#                   ## use EM algorithm to cluster genes
# tr=Hybrid.Tree(data=mydata,cluste=cls,model="nbinom")
#                   ## bulild a tree structure for the resulting 10 clusters
# plotHybrid.Tree(merge=tr,cluster=cls,logFC=mydata$logFC,tree.title=NULL)
#                   ## plot the tree structure

Run the code above in your browser using DataLab