The emil package implements a framework for working with predictive modeling problems without information leakage. For an overview of its functionality please read the original publication included as the package's vignette (to be added).
resample
Functions for generating and resampling schemes and information on how to implement custom resampling methods.
pre_process
Data pre-processing functions.
modeling_procedure
Manages algorithms used for fitting models, making predictions, and extracting feature importance scores.
error_fun
Performance estimation functions used to tune parameters and evaluate performance of modeling procedures.
fit
Fit a model (according to a procedure).
tune
Tune parameters of a procedure.
predict
Use a fitted model to predict the response of observations.
evaluate
Evaluate the performance of a procedure using resampling.
learning_curve
Learning curve analysis.
get_prediction
Extract predictions from resampled modeling results.
get_tuning
Extract feature importance scores of a fitted model or resampled modeling results.
get_importance
Extract feature importance scores of a fitted model or resampled modeling results.
subtree
Extracts results
from the output of evaluate
. It is
essentially a recursive version of lapply
and
sapply
.
select
Interface between emil and the
dplyr
package for data manipulation.
Can be used to subset modeling results, reorganize or summarize
to help interpretation or prepare for plotting.
See resample
for information on usage and implementation
of custom methods.
resample_holdout
Repeated holdout.
resample_crossvalidation
Cross validation.
See pre_process
for information on usage and
implementation of custom methods. The imputation functions
can also be used outside of the resampling scheme, see
impute
.
pre_split
Only split, no transformation.
pre_center
Center data to have mean 0 of each feature.
pre_scale
Center and scale data to have mean 0 and standard deviation 1.
pre_impute_median
Impute missing values with feature medians.
pre_impute_knn
Impute missing values
with k-NN, see pre_impute_knn
for details on
how to set parameters.
The following modeling methods are included in the emil package.
For a complete list of available methods in both the emil package and
other loaded packages, please use list_method
.
See modeling_procedure
for information on usage
and extension
for information on
implementation of custom methods.
cforest
Conditional inference forest.
coxph
Cox proportional hazards model.
glmnet
Elastic net.
lasso
LASSO.
lda
Linear discriminant.
lm
Linear model.
pamr
Nearest shrunken centroids.
qda
Quadratic discriminant.
randomForest
Random forest.
ridge_regression
Ridge regression.
rpart
Decision trees.
fit_caret
.To search for emil compatible methods in all attached packages use the
list_method
function.
See error_fun
for information on usage and implementation
of custom methods. Since the framework is designed to minimize the error
when tuning parameters, some measures are negated, e.g. neg_auc
.
For classification problems:
error_rate
Fraction of predictions that were incorrect.
weighted_error_rate
See its own documentation.
neg_auc
Negative area under ROC curve.
To plot the ROC curves see roc_curve
.
neg_gmpa
Negative geometric mean of class-specific prediction accuracy. Good for problems with imbalanced class sizes.
neg_harrell_c
Negative Harrell's concordance index.
Plotting is not the one of the main aims of the package and the methods that do exist mainly serves as examples for how to write your own. These exists for: