Free Access Week - Data Engineering + BI
Data Engineering and BI courses are free this week!
Free Access Week - Jun 2-8

⚠️There's a newer version (2.0.0) of this package.Take me there.

FFTrees

The goal of FFTrees is to create and visualize fast-and-frugal decision trees (FFTs) from data with a binary outcome following the methods described in Phillips, Neth, Woike & Gaissmaier (2017).

Installation

You can install the released version of FFTrees from CRAN with:

install.packages("FFTrees")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("ndphillips/FFTrees", build_vignettes = TRUE)

Examples

library(FFTrees)
#> 
#>    O
#>   / \
#>  F   O
#>     / \
#>    F   Trees 1.5.3
#> 
#> Nathaniel.D.Phillips.is@gmail.com
#> FFTrees.guide() opens the guide.

Let’s create a fast-and-frugal tree predicting heart disease status (“Healthy” vs. “Diseased”) based on a heart.train dataset, and test it on heart.test a testing dataset.

Here are the first new rows and columns of heart.train, our training dataset. The key column is diagnosis, a logical column (TRUE and FALSE) which indicate, for each patient, whether or not they have heart disease. The heart.test dataset looks similar but with different cases (i.e.; patients)

knitr::kable(heart.train[1:7, 1:10])
diagnosisagesexcptrestbpscholfbsrestecgthalachexang
FALSE440np1081410normal1750
FALSE510np1403080hypertrophy1420
FALSE521np1382230normal1690
TRUE481aa1102290normal1680
FALSE591aa1402210normal1641
FALSE581np1052400hypertrophy1541
FALSE410aa1263060normal1630

Now let’s use FFTrees() to create a fast and frugal tree from the heart.train data and test their performance on heart.test

# Load package
library(FFTrees)

# Create an FFTrees object from the heartdisease data
heart.fft <- FFTrees(formula = diagnosis ~., 
                     data = heart.train,
                     data.test = heart.test, 
                     decision.labels = c("Healthy", "Disease"))
#> Setting goal = 'wacc'
#> Setting goal.chase = 'waccc'
#> Setting cost.outcomes = list(hi = 0, mi = 1, fa = 1, cr = 0)
#> Growing FFTs with ifan
#> Fitting other algorithms for comparison (disable with do.comp = FALSE) ...

# See the print method which shows aggregatge statistics
heart.fft
#> FFTrees 
#> - Trees: 7 fast-and-frugal trees predicting diagnosis
#> - Outcome costs: [hi = 0, mi = 1, fa = 1, cr = 0]
#> 
#> FFT #1: Definition
#> [1] If thal = {rd,fd}, decide Disease.
#> [2] If cp != {a}, decide Healthy.
#> [3] If ca <= 0, decide Healthy, otherwise, decide Disease.
#> 
#> FFT #1: Prediction Accuracy
#> Prediction Data: N = 153, Pos (+) = 73 (48%) 
#> 
#> |         | True + | True - |
#> |---------|--------|--------|
#> |Decide + | hi 64  | fa 19  | 83
#> |Decide - | mi 9   | cr 61  | 70
#> |---------|--------|--------|
#>             73       80       N = 153
#> 
#> acc  = 81.7%  ppv  = 77.1%  npv  = 87.1%
#> bacc = 82.0%  sens = 87.7%  spec = 76.2%
#> E(cost) = 0.183
#> 
#> FFT #1: Prediction Speed and Frugality
#> mcu = 1.73, pci = 0.87

# Plot the best tree applied to the test data
plot(heart.fft,
     data = "test",
     main = "Heart Disease")

# Compare results across algorithms in test data
heart.fft$competition$test
#>   algorithm   n hi fa mi cr      sens   spec    far       ppv       npv
#> 1   fftrees 153 64 19  9 61 0.8767123 0.7625 0.2375 0.7710843 0.8714286
#> 2        lr 153 55 13 18 67 0.7534247 0.8375 0.1625 0.8088235 0.7882353
#> 3      cart 153 50 19 23 61 0.6849315 0.7625 0.2375 0.7246377 0.7261905
#> 4        rf 153 58  6 15 74 0.7945205 0.9250 0.0750 0.9062500 0.8314607
#> 5       svm 153 55  7 18 73 0.7534247 0.9125 0.0875 0.8870968 0.8021978
#>         acc      bacc      cost cost_decisions cost_cues
#> 1 0.8169935 0.8196062 0.1830065      0.1830065         0
#> 2 0.7973856 0.7954623 0.2026144      0.2026144        NA
#> 3 0.7254902 0.7237158 0.2745098      0.2745098        NA
#> 4 0.8627451 0.8597603 0.1372549      0.1372549        NA
#> 5 0.8366013 0.8329623 0.1633987      0.1633987        NA

Because fast-and-frugal trees are so simple, you can create one ‘from words’ and apply it to data!

# Create your own custom FFT 'in words' and apply it to data

# Create my own fft
my.fft <- FFTrees(formula = diagnosis ~., 
                  data = heart.train,
                  data.test = heart.test, 
                  decision.labels = c("Healthy", "Disease"),
                  my.tree = "If sex = 1, predict Disease.
                             If age < 45, predict Healthy.
                             If thal = {fd, normal}, predict Disease. 
                             Otherwise, predict Healthy")
#> Setting goal = 'wacc'
#> Setting goal.chase = 'waccc'
#> Setting cost.outcomes = list(hi = 0, mi = 1, fa = 1, cr = 0)
#> Fitting other algorithms for comparison (disable with do.comp = FALSE) ...

# Plot my custom fft and see how it did
plot(my.fft,
     data = "test",
     main = "Custom FFT")

Citation

APA Citation

Phillips, Nathaniel D., Neth, Hansjoerg, Woike, Jan K., & Gaissmaier, W. (2017). FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees. Judgment and Decision Making, 12(4), 344-368.

We had a lot of fun creating FFTrees and hope you like it too! We have an article introducing the FFTrees package in the journal Judgment and Decision Making titled FFTrees: A toolbox to create, visualize,and evaluate fast-and-frugal decision trees. We encourage you to read the article to learn more about the history of FFTs and how the FFTrees package creates them.

If you use FFTrees in your work, please cite us and spread the word so we can continue developing the package

Here are some example publications that have used FFTrees:

Copy Link

Version

Install

install.packages('FFTrees')

Monthly Downloads

373

Version

1.5.3

License

CC0

Maintainer

Nathaniel Phillips

Last Published

June 6th, 2023

Functions in FFTrees (1.5.3)

FFTrees

Creates a fast-and-frugal trees (FFTrees) object.
creditapproval

Credit approval data
FFTrees.guide

Opens the FFTrees package guide
comp.pred

Wrapper for classfication algorithms
Add_Stats

Adds decision statistics to a dataframe containing hr, cr, mi and fa
classtable

Calculates several classification statistics from binary prediction and criterion (e.g.; truth) vectors
contraceptive

Contraceptive use data
blood

Blood donation dataset
inwords

Display a verbal description of a tree in an FFTrees object
breastcancer

Dataset: Physiological dataset for 699 patients tested for breast cancer.
fftrees_cuerank

Calculates thresholds that maximize a statistic (goal) for cues.
car

Car acceptability data
fftrees_define

Create definitions of FFTrees
predict.FFTrees

Predict classifications from newdata using an FFTrees object
print.FFTrees

Prints summary information from an FFTrees object
fftrees_grow_fan

Grows fast-and-frugal trees using the fan algorithm
fftrees_wordstofftrees

Converts text describing an FFT into an FFT definition.
fftrees_ranktrees

Rank trees by goal
iris.v

Iris data set
fftrees_ffttowords

Describes an FFT in words
factclean

Does miscellaneous cleaning of prediction datasets
fertility

Fertility data set
fftrees_fitcomp

Fit competitive algorithms
heart.train

Heartdisease training dataset.
forestfires

forestfires
heartdisease

Heart disease dataset
plot.FFTrees

Plots an FFTrees object.
fftrees_apply

Applies a fast-and-frugal tree to a dataset and generates several accuracy statistics
fftrees_create

Create an FFTrees object
showcues

Visualizes cue accuracies from an FFTrees object in a ROC space
wine

Wine tasting dataset
mushrooms

Mushrooms dataset
heart.cost

Cue costs for the heartdisease dataa
heart.test

Heartdisease testing dataset
summary.FFTrees

Returns summary information about an FFTrees x
fftrees_threshold_factor_grid

Performs a grid search over factor and returns accuracy statistics for a given factor cue
titanic

Titanic dataset
sonar

Sonar data set
updateHistory

Update the history of decisions from trees in an FFTrees object
voting

Voting data set
fftrees_threshold_numeric_grid

Performs a grid search over thresholds and returns accuracy statistics for a given numeric cue