cosso: Estimate the mean regression function for Gaussian response using Smmohting Spines with COSSO penalty

Description

Fit COSSO and adaptive COSSO models for Gaussian response. COSSO is a regularization method for variable selection and function estimation in multivariate nonparametric regression models. By imposing a soft-thresholding type penalty onto function components, the COSSO solution is sparse and hence able to identify important variables. The method is developed in the framework of smoothing spline ANOVA.

Usage

cosso(x,y,wt=rep(1,ncol(x)),scale=FALSE,nbasis,basis.id,n.step=2*ncol(x))

Arguments

input matrix; the number of rows is sample size, the number of columns is the data dimension. The range of input variables is scaled to [0,1].

response vector

weights for predictors. Default is rep(1,ncol(x))

scale

if TRUE, each predictor variable is rescaled to [0,1] interval. Dafault is FALSE.

basis.id

index designating selected "knots".

nbasis

number of "knots" to be selected. Ignored when basis.id is provided.

n.step

maximum iteration number in fiding solution path.

Value

An object with S3 class "cosso".
familytype of regression model.
xthe input matrix
ythe response vector
Kmatan array containing kernel matrices for each input variables
basis.idIndices of observations used as "knots"
wtweights
tunea list containing prelminary tuning result

Details

The mean regression function is first assumed to have an additive form $$\eta(x)=\sum_{j=1}^p\eta_j(x_j),$$ then estimated by minimizing the objective function: $$RSS/nobs+\lambda_0\sum_{j=1}^p\theta^{-1}_jw_j^2||\eta_j||^2, s.t.~\sum_{j=1}^p\theta_jnbais as "knots", which reduces the dimension of the kernel matrices from nobs to nbasis. Unless specified via basis.id or nbasis, the default number of "knots" is the sample size (nobs). The weights can be specified based on either user's own discretion or adaptively computed from initial function estimates. See Storlie et al. (2011) for more discussions. One possible choice is to specify the weights as the inverse $L_2$ norm of initial function estimator, see SSANOVAwt.

References

Lin, Y. and Zhang, H. H. (2006) "Component Selection and Smoothing in Smoothing Spline Analysis of Variance Models", Annals of Statistics, 34, 2272--2297. Storlie, C. B., Bondell, H. D., Reich, B. J. and Zhang, H. H. (2011) "Surface Estimation, Variable Selection, and the Nonparametric Oracle Property", Statistica Sinica, 21, 679--705.

Examples

Run this code

data(ozone)
## Fit cosso
## Use 50 observations as knots
t0=proc.time()
## Use half of the observations for demonstration
set.seed(27695)
train.id <- sort(sample(1:nrow(ozone),ceiling(nrow(ozone)/2)))
cossoObj <- cosso(x=ozone[train.id,2:5],y=ozone[train.id,1],nbasis=50)
print((proc.time()-t0)[3])

## Use all observations as knots
t0=proc.time()
## Use half of the observations for demonstration
set.seed(27695)
train.id <- sort(sample(1:nrow(ozone),ceiling(nrow(ozone)/2)))
cossoObj <- cosso(x=ozone[train.id,2:5],y=ozone[train.id,1])
print((proc.time()-t0)[3])


## Fit adaptive cosso
adaptive.wt <- SSANOVAwt(ozone[,-1],ozone[,1])
acossoObj <- cosso(x=ozone[,-1],y=ozone[,1],wt=adaptive.wt,nbasis=ceiling(nrow(ozone)/5))

Run the code above in your browser using DataLab