FFTrees (version 1.4.0)

FFTrees: Creates a fast-and-frugal trees (FFTrees) object.

Description

This is the workhorse function for the FFTrees package. It creates (one or more) fast-and-frugal decision trees trained on a training dataset and tested on an optional test dataset.

Usage

FFTrees(formula = NULL, data = NULL, data.test = NULL,
  algorithm = "ifan", max.levels = NULL, sens.w = 0.5,
  cost.outcomes = NULL, cost.cues = NULL, stopping.rule = "exemplars",
  stopping.par = 0.1, goal = NULL, goal.chase = NULL,
  goal.threshold = "bacc", numthresh.method = "o",
  decision.labels = c("False", "True"), main = NULL, train.p = 1,
  rounding = NULL, repeat.cues = TRUE, my.tree = NULL,
  tree.definitions = NULL, do.comp = TRUE, do.cart = TRUE, do.lr = TRUE,
  do.rf = TRUE, do.svm = TRUE, store.data = FALSE, object = NULL,
  rank.method = NULL, force = FALSE, verbose = NULL, comp = NULL,
  quiet = TRUE)

Arguments

formula

formula. A formula specifying a logical criterion as a function of 1 or more predictors.

data

dataframe. A training dataset.

data.test

dataframe. An optional testing dataset with the same structure as data.

algorithm

character. The algorithm to create FFTs. Can be 'ifan', 'dfan', 'max', or 'zigzag'.

max.levels

integer. The maximum number of levels considered for the trees. Because all permutations of exit structures are considered, the larger max.levels is, the more trees will be created.

sens.w

numeric. A number from 0 to 1 indicating how to weight sensitivity relative to specificity. Only relevant when goal = 'wacc'

cost.outcomes

list. A list of length 4 with names 'hi', 'fa', 'mi', and 'cr' specifying the costs of a hit, false alarm, miss, and correct rejection rspectively. E.g.; cost.outcomes = listc("hi" = 0, "fa" = 10, "mi" = 20, "cr" = 0) means that a false alarm and miss cost 10 and 20 respectively while correct decisions have no cost.

cost.cues

list A list containing containing costs for each cue. Each element should have a name corresponding to a column in data, and each entry should be a single (positive) number. Cues not present in cost.cues are assume to have 0 cost.

stopping.rule

character. A string indicating the method to stop growing trees. "levels" means the tree grows until a certain level. "exemplars" means the tree grows until a certain number of unclassified exemplars remain. "statdelta" means the tree grows until the change in the criterion statistic is less than a specified level.

stopping.par

numeric. A number indicating the parameter for the stopping rule. For stopping.rule == "levels", this is the number of levels. For stopping rule == "exemplars", this is the smallest percentage of examplars allowed in the last level.

goal

character. A string indicating the statistic to maximize when selecting final trees: "acc" = overall accuracy, "wacc" = weighted accuracy, "bacc" = balanced accuracy

goal.chase

character. A string indicating the statistic to maximize when constructing trees: "acc" = overall accuracy, "wacc" = weighted accuracy, "bacc" = balanced accuracy, "cost" = cost.

goal.threshold

character. A string indicating the statistic to maximize when calculting cue thresholds: "acc" = overall accuracy, "wacc" = weighted accuracy, "bacc" = balanced accuracy

numthresh.method

character. How should thresholds for numeric cues be determined? "o" will optimize thresholds, while "m" will always use the median.

decision.labels

string. A vector of strings of length 2 indicating labels for negative and positive cases. E.g.; decision.labels = c("Healthy", "Diseased")

main

string. An optional label for the dataset. Passed on to other functions like plot.FFTrees(), and print.FFTrees()

train.p

numeric. What percentage of the data to use for training when data.test is not specified? For example, train.p = .5 will randomly split data into a 50% training set and a 50% test set. train.p = 1, the default, uses all data for training.

rounding

integer. An integer indicating digit rounding for non-integer numeric cue thresholds. The default is NULL which means no rounding. A value of 0 rounds all possible thresholds to the nearest integer, 1 rounds to the nearest .1 (etc.).

repeat.cues

logical. Can cues occur multiple times within a tree?

my.tree

string. A string representing an FFT in words. For example, my.tree = "If age > 20, predict TRUE. If sex = {m}, predict FALSE. Otherwise, predict TRUE"

tree.definitions

dataframe. An optional hard-coded definition of trees (see details below). If specified, no new trees are created.

do.comp, do.cart, do.lr, do.rf, do.svm

logical. Should alternative algorithms be created for comparison? cart = regular (non-frugal) trees with rpart, lr = logistic regression with glm, rf = random forests with randomForest, svm = support vector machines with e1071. Setting comp = FALSE sets all these arguments to FALSE.

store.data

logical. Should training / test data be stored in the object? Default is FALSE.

object

FFTrees. An optional existing FFTrees object. When specified, no new trees are fitted and the existing trees are applied to data and data.test.

rank.method, verbose, comp

depricated arguments.

force

logical. If TRUE, forces some parameters (like goal) to be as specified by the user even when the algorithm thinks those specifications don't make sense.

quiet

logical. Should progress reports be printed? Can be helpful for diagnosis when the function is running slowly.

Value

An FFTrees object with the following elements

formula

The formula specified when creating the FFTs.

data.desc

Descriptive statistics of the data

cue.accuracies

Marginal accuracies of each cue given a decision threshold calculated with the specified algorithm

tree.definitions

Definitions of each tree created by FFTrees. Each row corresponds to one tree. Different levels within a tree are separated by semi-colons. See above for more details.

tree.stats

Tree definitions and classification statistics. Training and test data are stored separately

cost

A list of cost information for each case in each tree.

level.stats

Cumulative classification statistics at each tree level. Training and test data are stored separately

decision

Final classification decisions. Each row is a case and each column is a tree. For example, row 1 in column 2 is the classification decision of tree number 2 for the first case. Training and test data are stored separately.

levelout

The level at which each case is classified in each tree. Rows correspond to cases and columns correspond to trees. Training and test data are stored separately.

tree.max

The index of the 'final' tree specified by the algorithm. For algorithms that only return a single tree, this value is always 1.

inwords

A verbal definition of tree.max.

params

A list of defined control parameters (e.g.; algorithm, goal)

comp

Models and classification statistics for competitive classification algorithms: Regularized logistic regression, CART, and random forest.

data

The original training and test data (only included when store.data = TRUE)

Examples

Run this code
# NOT RUN {
 # Create fast-and-frugal trees for heart disease
 heart.fft <- FFTrees(formula = diagnosis ~.,
                      data = heart.train,
                      data.test = heart.test,
                      main = "Heart Disease",
                      decision.labels = c("Healthy", "Diseased"))

 # Print the result for summary info
 heart.fft

 # Plot the tree applied to training data
 plot(heart.fft, stats = FALSE)
 plot(heart.fft)
 plot(heart.fft, data = "test")  # Now for testing data
 plot(heart.fft, data = "test", tree = 2) # Look at tree number 2


 ## Predict classes and probabilities for new data

 predict(heart.fft, newdata = heartdisease)
 predict(heart.fft, newdata = heartdisease, type = "prob")

 ### Create your own custom tree with my.tree

 custom.fft <- FFTrees(formula = diagnosis ~ .,
                       data = heartdisease,
                       my.tree = 'If chol > 300, predict True.
                                  If sex = {m}, predict False,
                                  If age > 70, predict True, otherwise predict False'
                                  )

 # Plot the custom tree (it's pretty terrible)
 plot(custom.fft)



# }

Run the code above in your browser using DataLab