Train the phenotyping model on the training dataset, and evaluate its performance via random splits of the training dataset.
phecap_train_phenotyping_model(
data, surrogates, feature_selected,
method = "lasso_bic",
train_percent = 0.7, num_splits = 200L,
start_seed = 78900L, verbose = 0L)an object of class PhecapData, obtained by calling PhecapData(...).
a list of objects of class PhecapSurrogate, obtained by something like
list(PhecapSurrogate(...), PhecapSurrogate(...)).
The surrogates used here might be different from
that used in feature extraction.
a character vector of the features that should be included in the model,
probably returned by phecap_run_feature_extraction
(but not necessary).
The features listed here might be different from
those returned from feature extraction.
Either a character vector or a list of two components. If a character vector is used, possible entries are given below. When at least two methods are specified, the predicted probability is the simple average of the predicted probabilities from each method.
'plain' (logistic regression without penalty)
'ridge_cv' (logistic regression with ridge penalty and CV tuning)
'lasso_cv' (logistic regression with lasso penalty and CV tuning)
'lasso_bic' (logistic regression with lasso penalty and BIC tuning)
'alasso_cv' (logistic regression with adaptive lasso penalty and CV tuning)
'alasso_bic' (logistic regression with adaptive lasso penalty and BIC tuning)
'svm' (support vector machine with CV tuning, package e1071 needed, subject_weight not supported)
'rf' (random forest with default parameters, package randomForestSRC needed)
'xgb' (extreme gradient boosting with default parameters, package xgboost needed)
If a list is used, it should contain two named components as follows.
fit (a function for model fitting, with arguments x, y, subject_weight, penalty_weight)
predict (a function for prediction, with arguments object which was returned by fit, x which was used as the new data to predict on)
The percentage (between 0 and 1) of labels that are used for model training during random splits
The number of random splits.
in the i-th split, the seed is set to start_seed + i.
print progress every verbose splits if verbose is positive, or remain quiet if verbose is zero
An object of class PhecapModel, with components
the fitted object
the method used for model training
the feature selected by SAFE
ROC on training dataset
AUC on training dataset
average ROC on random splits of training dataset
average AUC on random splits of training dataset
the function used for fitting
the function used for prediction
See PheCAP-package for code examples.