Belkin and others have shown that some machine learning algorithms exhibit surprising behavior when in overfitting settings. The classic U-shape of mean loss plotted against model complexity may be followed by a surprise second "mini-U."
Alternatively, one might keep the model complexity fixed while varying the number of data points n, including over a region in which n is smaller than the complexity value of the model. The surprise here is that mean loss may actually increase with n in the overfitting region.
The function doubleD facilitates easy exploration of this
phenomenon.
doubleD(qeFtnCall,xPts,nReps,makeDummies=NULL,classif=FALSE)Each call in xPts results in one line in the return value
of doubleD. The return matrix can then be plotted, using the
generic plot.doubleD. Mean test (red) and training (blue) accuracy
will be plotted against xPts.
Quoted string; somewhere should include 'xPts[i]'.
Range of values to be used in the experiments, e.g. a vector of degrees for polynomial models.
Number of repetitions for each experiment, typically the number in the holdout set.
If non-NULL, call regtools::factorsToDummies
on the dataset of this name. This avoids the problem of some
levels of a factor appearing in the holdout set but not the
training set.
Set TRUE if this is a classification problem.
Norm Matloff
The function will run the code in qeFtnCall nreps
times for each level specified in xPts, recording the test and
training error in each case. So, for each level, we will have a mean
test and training error.
if (FALSE) {
data(mlb1)
hw <- mlb1[,2:3]
doubleD('qePolyLin(hw,"Weight",deg=xPts[i])',1:20,250)
}
Run the code above in your browser using DataLab