Belkin and others have shown that some machine learning algorithms exhibit surprising behavior when in overfitting settings. The classic U-shape of mean loss plotted against model complexity may be followed by a surprise second "mini-U."
Alternatively, one might keep the model complexity fixed while varying the number of data points n, including over a region in which n is smaller than the complexity value of the model. The surprise here is that mean loss may actually increase with n in the overfitting region.
The function doubleD
facilitates easy exploration of this
phenomenon.
doubleD(qeFtnCall,xPts,nReps,makeDummies=NULL,classif=FALSE)
Each call in xPts
results in one line in the return value
of doubleD
. The return matrix can then be plotted, using the
generic plot.doubleD
. Mean test (red) and training (blue) accuracy
will be plotted against xPts
.
Quoted string; somewhere should include 'xPts[i]'.
Range of values to be used in the experiments, e.g. a vector of degrees for polynomial models.
Number of repetitions for each experiment, typically the number in the holdout set.
If non-NULL, call regtools::factorsToDummies
on the dataset of this name. This avoids the problem of some
levels of a factor appearing in the holdout set but not the
training set.
Set TRUE if this is a classification problem.
Norm Matloff
The function will run the code in qeFtnCall
nreps
times for each level specified in xPts
, recording the test and
training error in each case. So, for each level, we will have a mean
test and training error.
if (FALSE) {
data(mlb1)
hw <- mlb1[,2:3]
doubleD('qePolyLin(hw,"Weight",deg=xPts[i])',1:20,250)
}
Run the code above in your browser using DataLab