Learn R Programming

scR (version 0.4.0)

simvcd: Estimate the Vapnik-Chervonenkis (VC) dimension of an arbitrary binary classification algorithm.

Description

Estimate the Vapnik-Chervonenkis (VC) dimension of an arbitrary binary classification algorithm.

Usage

simvcd(
  model,
  dim,
  packages = list(),
  m = 1000,
  k = 1000,
  maxn = 5000,
  parallel = TRUE,
  coreoffset = 0,
  predictfn = NULL,
  a = 0.16,
  a1 = 1.2,
  a11 = 0.14927,
  minn = (dim + 1),
  ...
)

Value

A real number giving the estimated value of the VC dimension of the supplied model.

Arguments

model

A binary classification model supplied by the user. Must take arguments formula and data

dim

A positive integer giving dimension (number of input features) of the model.

packages

A list of strings giving the names of packages to be loaded in order to estimate the model.

m

A positive integer giving the number of simulations to be performed at each design point (sample size value). Higher values give more accurate results but increase computation time.

k

A positive integer giving the number of design points (sample size values) for which the bounding function is to be estimated. Higher values give more accurate results but increase computation time.

maxn

Gives the vertical dimension of the data (number of observations) to be generated.

parallel

Boolean indicating whether or not to use parallel processing.

coreoffset

If parallel is true, a positive integer indicating the number of free threads to be kept unused. Should not be larger than the number of CPU cores.

predictfn

An optional user-defined function giving a custom predict method. If also using a user-defined model, the model should output an object of class "svrclass" to avoid errors.

a

Scaling coefficient for the bounding function. Defaults to the value given by Vapnik, Levin and Le Cun 1994.

a1

Scaling coefficient for the bounding function. Defaults to the value given by Vapnik, Levin and Le Cun 1994.

a11

Scaling coefficient for the bounding function. Defaults to the value given by Vapnik, Levin and Le Cun 1994.

minn

Optional argument to set a different minimum n than the dimension of the algorithm. Useful with e.g. regularized regression models such as elastic net.

...

Additional arguments that need to be passed to model

See Also

scb(), to calculate sample complexity bounds given estimated VCD.

Examples

Run this code
mylogit <- function(formula, data){
m <- structure(
  glm(formula=formula,data=data,family=binomial(link="logit")),
  class=c("svrclass","glm")  #IMPORTANT - must use the class svrclass to work correctly
)
return(m)
}
mypred <- function(m,newdata){
out <- predict.glm(m,newdata,type="response")
out <- factor(ifelse(out>0.5,1,0),levels=c("0","1"))
#Important - must specify levels to account for possibility of all
#observations being classified into the same class in smaller samples
return(out)
}
library(parallel)
vcd <- simvcd(model=mylogit,dim=7,m=10,k=10,maxn=50,predictfn = mypred,
    coreoffset = (detectCores() -2))

Run the code above in your browser using DataLab