widenet: Extends the relaxnet Package with Polynomial Basis Expansions

Description

Expands the basis according to the order argument, then runs relaxnet in order to select a subset of the basis functions. Multiple values of order and alpha (the elastic net tuning parameter) may be specified, leading to selection of a specific value by cross-validation.

Usage

widenet(x, y, family = c("gaussian", "binomial"), order = 1:3, alpha = 1, nfolds = 10, foldid, screen.method = c("none", "cor", "ttest"), screen.num.vars = 50, multicore = FALSE, mc.cores, mc.seed = 123, ...)

Arguments

Input matrix, each row is an observation vector. Sparse matrices are not yet supported for the widenet function. Must have unique colnames.

Response variable. Quantitative for family="gaussian". For family="binomial" should be either a factor with two levels, or a two-column matrix of counts or proportions.

family

Response type (see above).

order

The order of basis expansion. Elements must be in the set c(1, 2, 3). If there is more than one element, cross-validation is used to chose the order with best cross-validated performance.

alpha

The elastic net mixing parameter, see glmnet. If there is more than one element, cross-validation is used to chose the value with best cross-validated performance.

nfolds

Number of folds - default is 10. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is nfolds=3.

foldid

An optional vector of values between 1 and nfold identifying what fold each observation is in. If supplied, nfolds can be missing.

screen.method

The method to use to screen variables before basis expansion is applied. Default is no screening. "cor" = correlation, i.e. bivariate correlation with the outcome. ttest is meant for binary outcomes (family = "binomial"). The screening methods are adapted from the SuperLearner package, the author of which is Eric Polley.

screen.num.vars

The number of variables (columns of x to screen in when using screening.

multicore

Should execution be parallelized over cv folds (for cv.relaxnet) or over alpha values (for cv.alpha.relaxnet) using multicore functionality from R's parallel package?

mc.cores

Number of cores/cpus to be used for multicore processing. Parallelization is over cross-validation folds.

mc.seed

Integer value with which to seed the RNG when using parallel processing (internally, RNGkind will be called to set the RNG to "L'Ecuyer-CMRG"). Will be ignored if multicore is FALSE. If mulicore is FALSE, one should be able to get reprodicible results by setting the seed normally (with set.seed) prior to running.

...

Further arguments passed to relaxnet or cv.relaxnet, which should also be passed on to glmnet. Use with caution as this has not been tested.

Value

call: A copy of the call which generated this object
order: The value of the order argument
alpha: The value of the alpha argument
screen.method: The value of the screen.method argument
screened.in.index: A vector which indexes the columns of x, indicating those variables which were screened in for the run on the full data
colsBinary: A vector of length ncol(x) representing which of the columns of x contained binary data. These columns will be represented by a 2. The other columns will have a 3.
cv.relaxnet.results: A list of lists containing "cv.relaxnet" objects, one for each combination of values of alpha and order.
min.cvm.mat: A matrix containing the minimum cross-validated risk for each combination of values of alpha and order
which.order.min: The order which "won" the cross-validation, i.e. resulted in minimum cross-validated risk.
which.alpha.min: The alpha value which "won" the cross-validation.
total.time: Total time in seconds to produce this result.

Details

The type.measure argument has not yet been implemented. For type = gaussian models, mean squared error is used, and for type = binomial, binomial deviance is used.

References

Stephan Ritter and Alan Hubbard, Tech report (forthcoming).

Examples

Run this code


n <- 300
p <- 5

set.seed(23)
x <- matrix(rnorm(n*p), n, p)

colnames(x) <- paste("x", 1:ncol(x), sep = "")

y <- x[, 1] + x[, 2] + x[, 3] * x[, 4] + x[, 5]^2 + rnorm(n)

widenet.result <- widenet(x, y, family = "gaussian",
                          order = 2, alpha = 0.5)

summary(widenet.result)
coefs <- drop(predict(widenet.result, type = "coef"))
coefs[coefs != 0]

Run the code above in your browser using DataLab