Learn R Programming

RMCLab (version 0.1.0)

create_splits: Create splits of observed data cells for hyperparameter tuning

Description

Split the observed cells of a data matrix into training and validation sets for hyperparameter tuning. Methods are available for repeated holdout validation and \(K\)-fold cross-validation.

Usage

create_splits(indices, control)

holdout(indices, pct = 0.1, R = 10L)

cv_folds(indices, K = 5L)

Value

A list of index vectors giving the validation sets of the respective replication or cross-validation fold.

Arguments

indices

an integer vector giving the indices of observed cells in a data matrix.

control

a control object inheriting from class "split_control" as generated by holdout_control() for repeated holdout validation or cv_folds_control() for \(K\)-fold cross-validation.

pct

numeric in the interval (0, 1); the percentage of observed cells in the data matrix to be randomly selected into the validation set (defaults to 0.1).

R

an integer giving the number of random splits into training and validation sets (defaults to 10).

K

an integer giving the number of cross-validation folds (defaults to 5).

Author

Andreas Alfons

Details

Functions holdout() and cv_folds() are wrapper functions that first call holdout_control() and cv_folds_control(), respectively, before calling create_splits().

See Also

holdout_control(), cv_folds_control(),

rdmc_tune(), soft_impute_tune()

Examples

Run this code
# toy example derived from MovieLens 100K dataset
data("MovieLensToy")
# set up validation sets so that methods use same data splits
set.seed(20250723)
observed <- which(!is.na(MovieLensToy))
holdout_splits <- holdout(observed, R = 5)
# robust discrete matrix completion with hyperparameter tuning
fit_RDMC <- rdmc_tune(
  MovieLensToy, 
  lambda = fraction_grid(nb_lambda = 6),
  splits = holdout_splits
)
# Soft-Impute with discretization step and hyperparameter tuning
fit_SI <- soft_impute_tune(
  MovieLensToy, 
  lambda = fraction_grid(nb_lambda = 6, reverse = TRUE),
  splits = holdout_splits
)
# extract optimal values of regularization parameter
get_lambda(fit_RDMC)
get_lambda(fit_SI)

Run the code above in your browser using DataLab