FFTrees (version 1.4.0)

FFTrees: Creates a fast-and-frugal trees (FFTrees) object.


This is the workhorse function for the FFTrees package. It creates (one or more) fast-and-frugal decision trees trained on a training dataset and tested on an optional test dataset.


FFTrees(formula = NULL, data = NULL, data.test = NULL,
  algorithm = "ifan", max.levels = NULL, sens.w = 0.5,
  cost.outcomes = NULL, cost.cues = NULL, stopping.rule = "exemplars",
  stopping.par = 0.1, goal = NULL, goal.chase = NULL,
  goal.threshold = "bacc", numthresh.method = "o",
  decision.labels = c("False", "True"), main = NULL, train.p = 1,
  rounding = NULL, repeat.cues = TRUE, my.tree = NULL,
  tree.definitions = NULL, do.comp = TRUE, do.cart = TRUE, do.lr = TRUE,
  do.rf = TRUE, do.svm = TRUE, store.data = FALSE, object = NULL,
  rank.method = NULL, force = FALSE, verbose = NULL, comp = NULL,
  quiet = TRUE)



formula. A formula specifying a logical criterion as a function of 1 or more predictors.


dataframe. A training dataset.


dataframe. An optional testing dataset with the same structure as data.


character. The algorithm to create FFTs. Can be 'ifan', 'dfan', 'max', or 'zigzag'.


integer. The maximum number of levels considered for the trees. Because all permutations of exit structures are considered, the larger max.levels is, the more trees will be created.


numeric. A number from 0 to 1 indicating how to weight sensitivity relative to specificity. Only relevant when goal = 'wacc'


list. A list of length 4 with names 'hi', 'fa', 'mi', and 'cr' specifying the costs of a hit, false alarm, miss, and correct rejection rspectively. E.g.; cost.outcomes = listc("hi" = 0, "fa" = 10, "mi" = 20, "cr" = 0) means that a false alarm and miss cost 10 and 20 respectively while correct decisions have no cost.


list A list containing containing costs for each cue. Each element should have a name corresponding to a column in data, and each entry should be a single (positive) number. Cues not present in cost.cues are assume to have 0 cost.


character. A string indicating the method to stop growing trees. "levels" means the tree grows until a certain level. "exemplars" means the tree grows until a certain number of unclassified exemplars remain. "statdelta" means the tree grows until the change in the criterion statistic is less than a specified level.


numeric. A number indicating the parameter for the stopping rule. For stopping.rule == "levels", this is the number of levels. For stopping rule == "exemplars", this is the smallest percentage of examplars allowed in the last level.


character. A string indicating the statistic to maximize when selecting final trees: "acc" = overall accuracy, "wacc" = weighted accuracy, "bacc" = balanced accuracy


character. A string indicating the statistic to maximize when constructing trees: "acc" = overall accuracy, "wacc" = weighted accuracy, "bacc" = balanced accuracy, "cost" = cost.


character. A string indicating the statistic to maximize when calculting cue thresholds: "acc" = overall accuracy, "wacc" = weighted accuracy, "bacc" = balanced accuracy


character. How should thresholds for numeric cues be determined? "o" will optimize thresholds, while "m" will always use the median.


string. A vector of strings of length 2 indicating labels for negative and positive cases. E.g.; decision.labels = c("Healthy", "Diseased")


string. An optional label for the dataset. Passed on to other functions like plot.FFTrees(), and print.FFTrees()


numeric. What percentage of the data to use for training when data.test is not specified? For example, train.p = .5 will randomly split data into a 50% training set and a 50% test set. train.p = 1, the default, uses all data for training.


integer. An integer indicating digit rounding for non-integer numeric cue thresholds. The default is NULL which means no rounding. A value of 0 rounds all possible thresholds to the nearest integer, 1 rounds to the nearest .1 (etc.).


logical. Can cues occur multiple times within a tree?


string. A string representing an FFT in words. For example, my.tree = "If age > 20, predict TRUE. If sex = {m}, predict FALSE. Otherwise, predict TRUE"


dataframe. An optional hard-coded definition of trees (see details below). If specified, no new trees are created.

do.comp, do.cart, do.lr, do.rf, do.svm

logical. Should alternative algorithms be created for comparison? cart = regular (non-frugal) trees with rpart, lr = logistic regression with glm, rf = random forests with randomForest, svm = support vector machines with e1071. Setting comp = FALSE sets all these arguments to FALSE.


logical. Should training / test data be stored in the object? Default is FALSE.


FFTrees. An optional existing FFTrees object. When specified, no new trees are fitted and the existing trees are applied to data and data.test.

rank.method, verbose, comp

depricated arguments.


logical. If TRUE, forces some parameters (like goal) to be as specified by the user even when the algorithm thinks those specifications don't make sense.


logical. Should progress reports be printed? Can be helpful for diagnosis when the function is running slowly.


An FFTrees object with the following elements


The formula specified when creating the FFTs.


Descriptive statistics of the data


Marginal accuracies of each cue given a decision threshold calculated with the specified algorithm


Definitions of each tree created by FFTrees. Each row corresponds to one tree. Different levels within a tree are separated by semi-colons. See above for more details.


Tree definitions and classification statistics. Training and test data are stored separately


A list of cost information for each case in each tree.


Cumulative classification statistics at each tree level. Training and test data are stored separately


Final classification decisions. Each row is a case and each column is a tree. For example, row 1 in column 2 is the classification decision of tree number 2 for the first case. Training and test data are stored separately.


The level at which each case is classified in each tree. Rows correspond to cases and columns correspond to trees. Training and test data are stored separately.


The index of the 'final' tree specified by the algorithm. For algorithms that only return a single tree, this value is always 1.


A verbal definition of tree.max.


A list of defined control parameters (e.g.; algorithm, goal)


Models and classification statistics for competitive classification algorithms: Regularized logistic regression, CART, and random forest.


The original training and test data (only included when store.data = TRUE)


Run this code
 # Create fast-and-frugal trees for heart disease
 heart.fft <- FFTrees(formula = diagnosis ~.,
                      data = heart.train,
                      data.test = heart.test,
                      main = "Heart Disease",
                      decision.labels = c("Healthy", "Diseased"))

 # Print the result for summary info

 # Plot the tree applied to training data
 plot(heart.fft, stats = FALSE)
 plot(heart.fft, data = "test")  # Now for testing data
 plot(heart.fft, data = "test", tree = 2) # Look at tree number 2

 ## Predict classes and probabilities for new data

 predict(heart.fft, newdata = heartdisease)
 predict(heart.fft, newdata = heartdisease, type = "prob")

 ### Create your own custom tree with my.tree

 custom.fft <- FFTrees(formula = diagnosis ~ .,
                       data = heartdisease,
                       my.tree = 'If chol > 300, predict True.
                                  If sex = {m}, predict False,
                                  If age > 70, predict True, otherwise predict False'

 # Plot the custom tree (it's pretty terrible)

# }

