# Michel Ballings

#### 11 packages on CRAN

This package includes functions to compute the area under the curve of selected measures: The area under the sensitivity curve (AUSEC), the area under the specificity curve (AUSPC), the area under the accuracy curve (AUACC), and the area under the receiver operating characteristic curve (AUROC). The curves can also be visualized. Support for partial areas is provided.

Efficiently create dummies of all factors and character vectors in a data frame. Support is included for learning the categories on one data set (e.g., a training set) and deploying them on another (e.g., a test set).

Functions to build and deploy a hybrid ensemble consisting of eight different sub-ensembles: bagged logistic regressions, random forest, stochastic boosting, kernel factory, bagged neural networks, bagged support vector machines, rotation forest, and bagged k-nearest neighbors. Functions to cross-validate the hybrid ensemble and plot and summarize the results are also provided. There is also a function to assess the importance of the predictors.

Compute permutation- based performance measures and create partial dependence plots for (cross-validated) 'randomForest' and 'ada' models.

Binary classification based on an ensemble of kernel machines ("Ballings, M. and Van den Poel, D. (2013), Kernel Factory: An Ensemble of Kernel Machines. Expert Systems With Applications, 40(8), 2904-2913"). Kernel factory is an ensemble method where each base classifier (random forest) is fit on the kernel matrix of a subset of the training data.

Fit and deploy rotation forest models ("Rodriguez, J.J., Kuncheva, L.I., 2006. Rotation forest: A new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1619-1630 <doi:10.1109/TPAMI.2006.211>") for binary classification. Rotation forest is an ensemble method where each base classifier (tree) is fit on the principal components of the variables of random partitions of the feature set.

Convenience functions for aggregating data frame. Currently mean, sum and variance are supported. For Date variables, recency and duration are supported. There is also support for dummy variables in predictive contexts.

Compute missing values on a training data set and impute them on a new data set. Current available options are median/mode and random forest.

Compute the top decile lift and plot the lift curve. Cumulative lift curves are also supported.