pETM(x,y,cx=NULL,alpha=0.1,maxit=100000,thre=1e-6,group=NULL,lambda=NULL,
type=c("ring","fcon"),etm=c("none","normal","beta"),psub=0.5,nlam=10,
kb=10,K=100)
1E-6
. For fast computation, use a larger
value than the default value.group
should be equivalent to the total
number of genes or genetic regions, and the sum of group
should
be the same as thlambda
sequence based on nlam
and kb
group
is
specified. "ring
" and "fcon
" represent a ring and fully
connected network, respectively. Default is "ring
". See details.none
does not perform
an exponential tilt model, instead an ordinary penalized logistic
regression model is applied. normal
performs a penalized
exponential tilt modepsub
$\in[0.5,1)$. The default is 0.5.lambda
values used for resamplings,
and default is 10. For fast computation, use a smaller value than the
default value.lambda
values and default is 10.lambda
values usedetm
is
normal
and $h_1(x)=-\log(x)$ and $h_2(x)=-\log(1-x)$ if
etm
is beta
.
The penalty function of pETM
is defined as
$$\alpha||\beta||_1+(1-\alpha)(\beta^{T}L\beta)/2,$$
where $L$ is a Laplacian matrix describing a group structure of
CpG sites. This penalty is equivalent to the Lasso penalty if
alpha=1
. When group
is not defined, $L$ is replaced by
an identity matrix. In this case, pETM
performs an elastic-net
regularization procedure since the second term of the penalty simply
reduces to the squared $l_2$ norm of $\beta$.
If group sizes of CpG sites are listed in group
, it is assumed
that CpG sites within the same genes are linked with each other like
a ring or a fully connected network. In this case, the Laplacian matrix
forms a block-wise diagonal matrix. The ring network assumes only
adjacent CpG sites within the same genes are linked with each other,
while every CpG sites within the same genes are linked with each other
for fully connected network. For a big gene, ring network is recommended
for computational speed-up.
The selection result is summarized as the selection probability of
individual CpG sites. The psub
portions of n
samples are
randomly selected without replacement K
times. For each
subsample of (x
,cx
,y
), pETM
is applied to
find non-zero coefficients of CpG sites along with nlam
lambda
values. The selection probability of each CpG site is then computed
based on the maximum proportion of non-zero regression coefficients
among K
replications.n <- 100
p <- 500
x <- matrix(rnorm(n*p), n, p)
y <- rep(0:1, c(50,50))
# a total of 200 genes each of which consists of 1, 2, or 5 CpG sites
gr <- rep(c(1,2,5), c(50,100,50))
# ordinary penalized logistic regression
g1 <- pETM(x, y, group=gr, K=10)
# penalized exponential tilt model based on Gaussian distribution
g2 <- pETM(x, y, group=gr, etm = "normal", K=10)
# penalized exponential tilt model based on Beta distribution
x2 <- matrix(runif(n*p), n, p)
g3 <- pETM(x2, y, group=gr, etm = "beta", K=10)
Run the code above in your browser using DataLab