Learn R Programming

easy.glmnet (version 1.1)

cv: Conduct cross-validation

Description

Function to easily cross-validate (including fold assignation, merging fold outputs, etc).

Usage

cv(x, y, family = c("binomial", "cox", "gaussian"), fit_fun, predict_fun, site = NULL,
covar = NULL, nfolds = 10, pred.format = NA, verbose = TRUE, ...)

Value

A list with the predictions and the models used.

Arguments

x

predictors. A matrix or data.frame (rows are observations and columns are variables) or a vector of factor (if only one predictor).

y

response to be predicted. A binary vector for "binomial", a "Surv" object for "cox", or a numeric vector for "gaussian".

family

distribution of y: "binomial", "cox", or "gaussian".

fit_fun

function to create the prediction model using the training subsets. It can have between two and four arguments(the first two are compulsory): x_training (training X data.frame), y_training (training Y outcomes), site_training (training site names), and covar_training (training covariates). It must return the overall prediction model, which may be a list of the different submodels used in different steps and/or derived from different imputations.

predict_fun

function to apply the prediction model to the test sets. It can have between two and four arguments (the first two are compulsory): model (the overall prediction model), x_test (test X data.frame), site_test (test site names), and covar_test (test covariates). It must return the predictions.

site

vector or factor with the sites' names, or NULL for studies conducted in a single site.

covar

other covariates that can be passed to fit_fun and predict_fun. A matrix or data.frame (rows are observations and columns are variables) or a vector of factor (if only one covariate).

...

other arguments that can be passed to fit_fun and predict_fun.

nfolds

number of folds, only used if folds is NULL.

pred.format

format of the predictions returned by each fold. E.g., if the prediction is an array, use NA.

verbose

(optional) logical, whether to print some messages during execution.

Author

Joaquim Radua

Details

This function iteratively divides the dataset into a training dataset, with which fits the model using the function fit_fun, and a test dataset, to which applies the model using the function predict_fun. It saves the models fit with the training datasets and the predictions obtained in the test datasets. The fols are assigned automatically using assign.folds, accounting for the site is this is not null.

See Also

glmnet_predict for obtaining predictions.

Examples

Run this code
# Create random x (predictors) and y (binary)
x = matrix(rnorm(25000), ncol = 50)
y = 1 * (plogis(apply(x[,1:5], 1, sum) + rnorm(500, 0, 0.1)) > 0.5)

# Predict y via cross-validation
fit_fun = function (x_training, y_training) {
  list(
    lasso = glmnet_fit(x_training, y_training, family = "binomial")
  )
}
predict_fun = function (m, x_test) {
  glmnet_predict(m$lasso, x_test)
}
# Only 2 folds to ensure the example runs quickly
res = cv(x, y, family = "binomial", fit_fun = fit_fun, predict_fun = predict_fun, nfolds = 2)

# Show accuracy
se = mean(res$predictions$y.pred[res$predictions$y == 1] > 0.5)
sp = mean(res$predictions$y.pred[res$predictions$y == 0] < 0.5)
bac = (se + sp) / 2
cat("Sensitivity:", round(se, 2), "\n")
cat("Specificity:", round(sp, 2), "\n")
cat("Balanced accuracy:", round(bac, 2), "\n")

Run the code above in your browser using DataLab