learing_curve_dat
Create Data to Plot a Learning Curve
For a given model, this function fits several versions on different sizes of the total training set and returns the results
- Keywords
- models
Usage
learing_curve_dat(dat, outcome = NULL, proportion = (1:10)/10,
test_prop = 0, verbose = TRUE, ...)
Arguments
- dat
the training data
- outcome
a character string identifying the outcome column name
- proportion
the incremental proportions of the training set that are used to fit the model
- test_prop
an optional proportion of the data to be used to measure performance.
- verbose
a logical to print logs to the screen as models are fit
- …
options to pass to
train
to specify the model. These should not includex
,y
,formula
, ordata
. IftrainControl
is used here, do not usemethod = "none"
.
Details
This function creates a data set that can be used to plot how well the model
performs over different sized versions of the training set. For each data
set size, the performance metrics are determined and saved. If
test_prop == 0
, the apparent measure of performance (i.e.
re-predicting the training set) and the resampled estimate of performance
are available. Otherwise, the test set results are also added.
If the model being fit has tuning parameters, the results are based on the
optimal settings determined by train
.
Value
a data frame with columns for each performance metric calculated by
train
as well as columns:
the number of data points used in the current model fit
which data were used to calculate performance. Values are "Resampling", "Training", and (optionally) "Testing"
See Also
Examples
# NOT RUN {
# }
# NOT RUN {
set.seed(1412)
class_dat <- twoClassSim(1000)
set.seed(29510)
lda_data <- learing_curve_dat(dat = class_dat,
outcome = "Class",
test_prop = 1/4,
## `train` arguments:
method = "lda",
metric = "ROC",
trControl = trainControl(classProbs = TRUE,
summaryFunction = twoClassSummary))
ggplot(lda_data, aes(x = Training_Size, y = ROC, color = Data)) +
geom_smooth(method = loess, span = .8) +
theme_bw()
# }
# NOT RUN {
# }