Learn R Programming

scR (version 0.4.0)

gendata: Simulate data with appropriate structure to be used in estimating sample complexity bounds

Description

Simulate data with appropriate structure to be used in estimating sample complexity bounds

Usage

gendata(model, dim, maxn, predictfn = NULL, varnames = NULL, ...)

Value

A data.frame containing the simulated data.

Arguments

model

A binary classification model supplied by the user. Must take arguments formula and data

dim

Gives the horizontal dimension of the data (number of predictor variables) to be generated.

maxn

Gives the vertical dimension of the data (number of observations) to be generated.

predictfn

An optional user-defined function giving a custom predict method. If also using a user-defined model, the model should output an object of class "svrclass" to avoid errors.

varnames

An optional character vector giving the names of variables to be used for the generated data

...

Additional arguments that need to be passed to model

See Also

estimate_accuracy(), to estimate sample complexity bounds given the generated data

Examples

Run this code
mylogit <- function(formula, data){
m <- structure(
  glm(formula=formula,data=data,family=binomial(link="logit")),
  class=c("svrclass","glm")  #IMPORTANT - must use the class svrclass to work correctly
)
return(m)
}
mypred <- function(m,newdata){
out <- predict.glm(m,newdata,type="response")
out <- factor(ifelse(out>0.5,1,0),levels=c("0","1"))
#Important - must specify levels to account for possibility of all
#observations being classified into the same class in smaller samples
return(out)
}
formula <- two_year_recid ~
  race + sex + age + juv_fel_count +
  juv_misd_count + priors_count + charge_degree..misd.fel.
dat <- gendata(mylogit,7,7214,mypred,all.vars(formula))
# \donttest{
library(parallel)
results <- estimate_accuracy(formula,mylogit,dat,predictfn = mypred,
    nsample=10,
    steps=10,
    coreoffset = (detectCores() -2))
# }

Run the code above in your browser using DataLab