Double Descent: Double Descent Phenomenon

Description

Belkin and others have shown that some machine learning algorithms exhibit surprising behavior when in overfitting settings. The classic U-shape of mean loss plotted against model complexity may be followed by a surprise second "mini-U."

Alternatively, one might keep the model complexity fixed while varying the number of data points n, including over a region in which n is smaller than the complexity value of the model. The surprise here is that mean loss may actually increase with n in the overfitting region.

The function doubleD facilitates easy exploration of this phenomenon.

Usage

doubleD(qeFtnCall,xPts,nReps,makeDummies=NULL,classif=FALSE)

Value

Each call in xPts results in one line in the return value of doubleD. The return matrix can then be plotted, using the generic plot.doubleD. Mean test (red) and training (blue) accuracy will be plotted against xPts.

Arguments

qeFtnCall: Quoted string; somewhere should include 'xPts[i]'.
xPts: Range of values to be used in the experiments, e.g. a vector of degrees for polynomial models.
nReps: Number of repetitions for each experiment, typically the number in the holdout set.
makeDummies: If non-NULL, call regtools::factorsToDummies on the dataset of this name. This avoids the problem of some levels of a factor appearing in the holdout set but not the training set.
classif: Set TRUE if this is a classification problem.

Author

Norm Matloff

Details

The function will run the code in qeFtnCall nreps times for each level specified in xPts, recording the test and training error in each case. So, for each level, we will have a mean test and training error.

Examples

Run this code

   if (FALSE) {
      data(mlb1)
      hw <- mlb1[,2:3]
      doubleD('qePolyLin(hw,"Weight",deg=xPts[i])',1:20,250)
   }

Run the code above in your browser using DataLab