pETM(x,y,cx=NULL,alpha=0.1,maxit=100000,thre=1e-6,group=NULL,lambda=NULL,
type=c("ring","fcon"),etm=c("none","normal","beta"),psub=0.5,nlam=10,
kb=10,K=100)1E-6. For fast computation, use a larger
value than the default value.group should be equivalent to the total
number of genes or genetic regions, and the sum of group should
be the same as thlambda sequence based on nlam and kbgroup is
specified. "ring" and "fcon" represent a ring and fully
connected network, respectively. Default is "ring". See details.none does not perform
an exponential tilt model, instead an ordinary penalized logistic
regression model is applied. normal performs a penalized
exponential tilt modepsub$\in[0.5,1)$. The default is 0.5.lambda values used for resamplings,
and default is 10. For fast computation, use a smaller value than the
default value.lambda values and default is 10.lambda values usedetm is
normal and $h_1(x)=-\log(x)$ and $h_2(x)=-\log(1-x)$ if
etm is beta.
The penalty function of pETM is defined as
$$\alpha||\beta||_1+(1-\alpha)(\beta^{T}L\beta)/2,$$
where $L$ is a Laplacian matrix describing a group structure of
CpG sites. This penalty is equivalent to the Lasso penalty if
alpha=1. When group is not defined, $L$ is replaced by
an identity matrix. In this case, pETM performs an elastic-net
regularization procedure since the second term of the penalty simply
reduces to the squared $l_2$ norm of $\beta$.
If group sizes of CpG sites are listed in group, it is assumed
that CpG sites within the same genes are linked with each other like
a ring or a fully connected network. In this case, the Laplacian matrix
forms a block-wise diagonal matrix. The ring network assumes only
adjacent CpG sites within the same genes are linked with each other,
while every CpG sites within the same genes are linked with each other
for fully connected network. For a big gene, ring network is recommended
for computational speed-up.
The selection result is summarized as the selection probability of
individual CpG sites. The psub portions of n samples are
randomly selected without replacement K times. For each
subsample of (x,cx,y), pETM is applied to
find non-zero coefficients of CpG sites along with nlam lambda
values. The selection probability of each CpG site is then computed
based on the maximum proportion of non-zero regression coefficients
among K replications.n <- 100
p <- 500
x <- matrix(rnorm(n*p), n, p)
y <- rep(0:1, c(50,50))
# a total of 200 genes each of which consists of 1, 2, or 5 CpG sites
gr <- rep(c(1,2,5), c(50,100,50))
# ordinary penalized logistic regression
g1 <- pETM(x, y, group=gr, K=10)
# penalized exponential tilt model based on Gaussian distribution
g2 <- pETM(x, y, group=gr, etm = "normal", K=10)
# penalized exponential tilt model based on Beta distribution
x2 <- matrix(runif(n*p), n, p)
g3 <- pETM(x2, y, group=gr, etm = "beta", K=10)Run the code above in your browser using DataLab