Generate descriptive summary for objects returned by functions in EHRsampling.
learningcurve_data(
x,
y,
method = "log",
metric = "MCC",
batchsize = 60,
class.prob,
pct.train = 0.8,
nfold = 5,
nrepeat = 10
)learningcurve_data() returns a data frame of sample size and the corresponding performance measurements.
a matrix of predictor variables
a vector of binary outcome, encoded as a factor and denoted by 1 for events and 0 for non-events
training method to get performance measurements. Available options are "log" (logistic regression, default), "regul.log" (regularized logistic regression), "svm" (support vector machine), "rf" (random forest) and "lda" (linear discriminant analysis)
default = "MCC". The target performance estimation metric that you want to optimize. Other choice can be "AUC".
sample size for each training batch
probability of the event
the percentage of data that goes to training. Default is 0.8
number of folds in cross validation
number of repeats for cross validation