Applying POCRE to select variables and evaluate the significance of selected variables using the multiple splitting method by Meinshausen et al. (2009). The tuning parameter may be selected based on either an information criterion or k-fold cross-validation. The tuning parameter can also be fixed at a prespecified value.
sipocre(y, x, n.splits=10, delta=0.1, crit=c('ic','cv','fixed'),
ptype=c('ebtz','ebt','l1','scad','mcp'), maxvar=dim(x)[1]/2,
msc=NA, maxit=100, maxcmp=50, gamma=3.7, tol=1e-6,
n.folds=10, lambda=1, n.train=round(nrow(x)/2))n*q matrix, values of q response variables (allow for multiple response variables).
n*p matrix, values of p predicting variables (excluding the intercept).
number of random splits (=10 by default).
step size to increase or decrase from current tuning parameter.
character indicating the criterion to choose the tuning parameter: 'ic' (information criteria such as AIC, AICc, BIC, EBIC), 'cv' (k-folds cross-valdiation) or 'fixed' (a pre-specified value).
a character to indicate the type of penalty: 'ebtz' (emprical Bayes thresholding after Fisher's z-transformation, by default), 'ebt' (emprical Bayes thresholding by Johnstone & Silverman (2004)), 'l1' (L_1 penalty), 'scad' (SCAD by Fan & Li (2001)), 'mcp' (MCP by Zhang (2010)).
maximum number of selected variables.
value(s) to indicate the penalty related to the information criterion: 0~1 for (E)BIC, 2 for AIC, 3 for AICc, used when crit='ic'.
maximum number of iterations to be allowed.
maximum number of components to be constructed.
a parameter used by SCAD and MCP (=3.7 by default).
tolerance of precision in iterations.
number of folds in k-folds cross-validation, used when crit='cv'.
pre-sepcified value for the tuning parameter, used when crit='fixed'.
sample size of the training data set.
a list consisting of the following components,
component-based p-values which are calculated by testing the constructed components, either a matrix (when crit='ic', in this case each column corresponds to one value in msc) or a vector (when crit='cv' or crit='fixed').
traditional p-values, either a matrix (when crit='ic', in this case each column corresponds to one value in msc) or a vector (when crit='cv' or crit='fixed').
Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96:1348-1360
Johnstone IM and Silverman BW (2004). Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences. Annals of Statistics, 32: 1594-1649.
Meinshausen N, Meier L, and Buhlmann P (2009) p-Values for High-Dimensional Regression. Journal of the American Statistical Association, 104: 1671-1681.
Zhang C-H (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38: 894-942.
Zhang D, Lin Y, and Zhang M (2009). Penalized orthogonal-components regression for large p small n data. Electronic Journal of Statistics, 3: 781-796.
# NOT RUN {
data(simdata)
xx <- simdata[,-1]
yy <- simdata[,1]
sipres <- sipocre(yy,xx)
# }
Run the code above in your browser using DataLab