gsMMD2.default(X, memSubjects, memIni, maxFlag = TRUE, thrshPostProb = 0.5, geneNames = NULL, alpha = 0.05, transformFlag = FALSE, transformMethod = "boxcox", scaleFlag = TRUE, criterion = c("cor", "skewness", "kurtosis"), minL = -10, maxL = 10, stepL = 0.1, eps = 0.001, ITMAX = 100, plotFlag = FALSE, quiet=TRUE)
memSubjects[i]=1
means the $i$-th subject belongs to diseased group, $0$ otherwise.
maxFlag
=TRUE means that a gene will be assigned
to a class in which the posterior probability of the gene belongs to this class is maximum. maxFlag
=FALSE means that a gene will be assigned to class 1 if the posterior probability of the gene belongs to class 1 is
greater than thrshPostProb
. Similarly, a gene will be assigned to class 1 if the posterior probability of the gene belongs to class 1 is
greater than thrshPostProb
. If the posterior probability is less than thrshPostProb
, the gene
will be assigned to class 2 (non-differentially expressed gene group).thrshPostProb
, then this gene will be assigned to cluster 1.1-conf.level
,
conf.level
is the argument for the function t.test
.
transformFlag=TRUE
and scaleFlag=TRUE
, then scaling is performed after transformation.
To avoid linear dependence of tissue samples after scaling
gene profiles, we delete one tissue sample after scaling (c.f. details).
transformFlag=TRUE
, criterion
indicates what criterion to determine if data looks like normal. cor means using Pearson's correlation. The idea is that the observed quantiles after transformation should be close to theoretical normal quantiles. So we can use Pearson's correlation to check if the scatter plot of theoretical normal quantiles versus observed quantiles is a straightline. skewness means using skewness measure to check if the distribution of the transformed data are close to normal distribution; kurtosis means using kurtosis measure to check normality.lambda
parameter used in Box-Cox transformationlambda
parameter used in Box-Cox transformationlambda
parameter used in Box-Cox transformationeps
, this value is regarded as zero. dat
will be different from the input
microarray data matrix.memSubjects
.Note that genes in cluster 2 are non-differentially expressed across abnormal and normal tissue samples. Hence there are only 3 parameters for cluster 2.
To make sure the identifiability, we set the following contraints: $\mu_{c1}>\mu_{n1}$ and $\mu_{c3}<\mu_{n3}$.< p="">
To make sure the marginal covariance matrices are poisitive definite, we set the following contraints: $-1/(n_c-1)<\rho_{c1}<1$, $-1="" (n_n-1)<\rho_{n1}<1$,="" (n-1)<\rho_{2}<1$,="" (n_c-1)<\rho_{c3}<1$,="" (n_n-1)<\rho_{n3}<1$.<="" p="">
We also has the following constraints for the mixing proportion: $\pi_3=1-\pi_1-\pi_2$, $\pi_k>0$, $k=1,2,3$.
We apply the EM algorithm to estimate the model parameters. We regard the cluster membership of genes as missing values.
To facilitate the estimation of the parameters, we reparametrize the parameter vector as $\theta^*=(\pi_1$, $\pi_2$, $\mu_{c1}$, $\sigma^2_{c1}$, $r_{c1}$, $\delta_{n1}$, $\sigma^2_{n1}$, $r_{n1}$, $\mu_2$, $\sigma^2_2$, $r_2$, $\mu_{c3}$, $\sigma^2_{c3}$, $r_{c3}$, $\delta_{n3}$, $\sigma^2_{n3}$, $r_{n3})$, where $\mu_{n1}=\mu_{c1}-\exp(\delta_{n1})$, $\mu_{n3}=\mu_{c3}+\exp(\delta_{n3})$, $\rho_{c1}=(\exp(r_{c1})-1/(n_c-1))/(1+\exp(r_{c1}))$, $\rho_{n1}=(\exp(r_{n1})-1/(n_n-1))/(1+\exp(r_{n1}))$, $\rho_{2}=(\exp(r_{2})-1/(n-1))/(1+\exp(r_{2}))$, $\rho_{c3}=(\exp(r_{c3})-1/(n_c-1))/(1+\exp(r_{c3}))$, $\rho_{n3}=(\exp(r_{n3})-1/(n_n-1))/(1+\exp(r_{n3}))$.
Given a gene, the expression levels of the gene are assumed independent. However, after scaling, the scaled expression levels of the gene are no longer independent and the rank $r^*=r-1$ of the covariance matrix for the scaled gene profile will be one less than the rank $r$ for the un-scaled gene profile Hence the covariance matrix of the gene profile will no longer be positive-definite. To avoid this problem, we delete a tissue sample after scaling since its information has been incorrporated by other scaled tissue samples. We arbitrarily select the tissue sample, which has the biggest label number, from the tissue sample group that has larger size than the other tissue sample group. For example, if there are 6 cancer tissue samples and 10 normal tissue samples, we delete the 10-th normal tissue sample after scaling.
\rho_{c1}<1$,>\mu_{n3}$.<>gsMMD
,
gsMMD.default
,
gsMMD2
## Not run:
# library(ALL)
# data(ALL)
# eSet1 <- ALL[1:100, ALL$BT == "B3" | ALL$BT == "T2"]
# mat <- exprs(eSet1)
#
# mem.str <- as.character(eSet1$BT)
# nSubjects <- length(mem.str)
# memSubjects <- rep(0, nSubjects)
# # B3 coded as 0, T2 coded as 1
# memSubjects[mem.str == "T2"] <- 1
#
# myWilcox <-
# function(x, memSubjects, alpha = 0.05)
# {
# xc <- x[memSubjects == 1]
# xn <- x[memSubjects == 0]
#
# m <- sum(memSubjects == 1)
# res <- wilcox.test(x = xc, y = xn, conf.level = 1 - alpha)
# res2 <- c(res$p.value, res$statistic - m * (m + 1) / 2)
# names(res2) <- c("p.value", "statistic")
#
# return(res2)
# }
#
# tmp <- t(apply(mat, 1, myWilcox, memSubjects = memSubjects))
# colnames(tmp) <- c("p.value", "statistic")
# memIni <- rep(2, nrow(mat))
# memIni[tmp[, 1] < 0.05 & tmp[, 2] > 0] <- 1
# memIni[tmp[, 1] < 0.05 & tmp[,2] < 0] <- 3
#
# cat("initial gene cluster size>>\n"); print(table(memIni)); cat("\n");
#
# obj.gsMMD <- gsMMD2.default(mat, memSubjects, memIni = memIni,
# transformFlag = TRUE, transformMethod = "boxcox", scaleFlag = TRUE)
# round(obj.gsMMD$para, 3)
# ## End(Not run)
Run the code above in your browser using DataLab