Structures to define and control sampling methods for estimating predictive performance of models in the MachineShop package.
BootControl(samples = 25, ...)BootOptimismControl(samples = 25, ...)
CVControl(folds = 10, repeats = 1, ...)
CVOptimismControl(folds = 10, repeats = 1, ...)
OOBControl(samples = 25, ...)
SplitControl(prop = 2/3, ...)
TrainControl(...)
MLControl(
strata_breaks = 4,
strata_nunique = 5,
strata_prop = 0.1,
strata_size = 20,
times = NULL,
distr = NULL,
method = NULL,
seed = sample(.Machine$integer.max, 1),
...
)
number of bootstrap samples.
arguments passed to MLControl
.
number of cross-validation folds (K).
number of repeats of the K-fold partitioning.
proportion of cases to include in the training set
(0 < prop < 1
).
number of quantile bins desired for numeric data used in stratified resample estimation of model predictive performance.
number of unique values at or below which numeric data are stratified as categorical.
minimum proportion of data in each strata.
minimum number of values in each strata.
arguments passed to predict
.
integer to set the seed at the start of resampling.
MLControl
class object.
BootControl
constructs an MLControl
object for simple bootstrap
resampling in which models are fit with bootstrap resampled training sets and
used to predict the full data set (Efron and Tibshirani 1993).
BootOptimismControl
constructs an MLControl
object for
optimism-corrected bootstrap resampling (Efron and Gong 1983, Harrell et al. 1996).
CVControl
constructs an MLControl
object for repeated K-fold
cross-validation (Kohavi 1995). In this procedure, the full data set is
repeatedly partitioned into K-folds. Within a partitioning, prediction is
performed on each of the K folds with models fit on all remaining folds.
CVOptimismControl
constructs an MLControl
object for
optimism-corrected cross-validation resampling (Davison and Hinkley 1997,
eq. 6.48).
OOBControl
constructs an MLControl
object for out-of-bootstrap
resampling in which models are fit with bootstrap resampled training sets and
used to predict the unsampled cases.
SplitControl
constructs an MLControl
object for splitting data
into a seperate trianing and test set (Hastie et al. 2009).
TrainControl
constructs an MLControl
object for training and
performance evaluation to be performed on the same training set (Efron 1986).
The base MLControl
constructor initializes a set of control parameters
that are common to all resampling methods.
Parameters are available to control resampling strata which are constructed
from numeric proportions for BinomialVariate
; original values
for character
, factor
, logical
, and ordered
;
first columns of values for matrix
; original values for
numeric
; and numeric times within event statuses for Surv
.
Stratification of survival data by event status only can be achieved by
setting strata_breaks = 1
. Numeric values are stratified into
quantile bins and categorical values into factor levels. The number of bins
will be the largest integer less than or equal to strata_breaks
satisfying the strata_prop
and strata_size
control argument
thresholds. Categorical levels below the thresholds will be pooled
iteratively by reassigning values in the smallest nominal level to the
remaining ones at random and by combining the smallest adjacent ordinal
levels. Missing values are replaced with non-missing values sampled at
random with replacement.
Efron B and Tibshirani RJ (1993). An Introduction to the Bootstrap. Monographs on Statistics and Applied Probability 57. Boca Raton, Florida, USA: Chapman & Hall/CRC.
Efron B and Gong G (1983). A leisurely look at the bootstrap, the jackknife, and cross-validation. The American Statistician, 37 (1): 36-48.
Harrell FE, Lee KL, and Mark DB (1996). Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine, 15 (4): 361-387.
Kohavi R (1995). A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2, 1137-43. IJCAI'95. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Davison AC and Hinkley DV (1997). Bootstrap Methods and Their Application. New York, NY, USA: Cambridge University Press.
Hastie T, Tibshirani R, and Friedman J (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. Springer Series in Statistics. New York, NY, USA: Springer.
Efron B (1986). How biased is the apparent error rate of a prediction rule? Journal of the American Statistical Association, 81 (394): 461-70.
resample
, SelectedInput
,
SelectedModel
, TunedInput
,
TunedModel
# NOT RUN {
## Bootstrapping with 100 samples
BootControl(samples = 100)
## Optimism-corrected bootstrapping with 100 samples
BootOptimismControl(samples = 100)
## Cross-validation with 5 repeats of 10 folds
CVControl(folds = 10, repeats = 5)
## Optimism-corrected cross-validation with 5 repeats of 10 folds
CVOptimismControl(folds = 10, repeats = 5)
## Out-of-bootstrap validation with 100 samples
OOBControl(samples = 100)
## Split sample validation with 2/3 training and 1/3 testing
SplitControl(prop = 2/3)
## Training set evaluation
TrainControl()
# }
Run the code above in your browser using DataLab