mobForest (version 1.3.1)

mobforest.analysis: Model-based random forest analysis

Description

Main function that takes all the necessary arguments to start model-based random forest analysis.

Usage

mobforest.analysis(formula, partition_vars, data,
  mobforest_controls = mobforest.control(),
  new_test_data = as.data.frame(matrix(0, 0, 0)), processors = 1,
  model = linearModel, family = NULL, prob_cutoff = 0.5,
  seed = sample(1:1e+07, 1))

Arguments

formula

An object of class formula specifying the model. This should be of type y ~ x_1 + ... + x_k, where the variables x_1, x_2, ..., x_k are predictor variables and y represents an outcome variable. This model is referred to as the node model

partition_vars

A character vector specifying the partition variables

data

An input dataset that is used for constructing trees in random forest.

mobforest_controls

An object of class "'>mobforest.control" returned by mobforest.control(), that contains parameters controlling the construction of random forest.

new_test_data

A data frame representing test data for validating random forest model. This data is not used in in tree building process.

processors

A number of processors/cores on your computer that should be used for parallel computation.

model

A model of class "StatModel" used for fitting observations in current node. This parameter allows fitting a linear model or generalized linear model with formula y ~ x_1 + ... + x_k. The Parameter "linearModel" fits linear model. The parameter "glinearModel" fits Poisson or logistic regression model depending upon the specification of parameter "family" (explained next). If "family" is specified as binomial() then logistic regression is performed. If the "family" is specified as poisson() then Poisson regression is performed.

family

A description of error distribution and link function to be used in the model. This parameter needs to be specified if generalized linear model is considered. The parameter "binomial()" is to be specified when logistic regression is considered and "poisson()" when Poisson regression is considered as the node model. The values allowed for this parameter are binomial() and poisson().

prob_cutoff

In case of logistic regression as a node model, the predicted probabilities for OOB cases are converted into classes (yes/no, high/low, etc as specified) based on this probability cutoff. If logistic regression is not considered as node model, the prob_cutoff = NULL. By default it is 0.5 when parameter not specified (and logistic regression considered).

seed

Since this function uses parallel processes, to replicate results, set the cluster "clusterSetRNGStream()" seed.

Value

An object of class '>mobforest.output.

Details

mobforest.analysis is the main function that takes all the input parameters - model, partition variables, and forest control parameters - and starts the model-based random forest analysis. mobforest.analysis calls bootstrap function which constructs decision trees, computes out-of-bag (OOB) predictions, OOB predictive accuracy and perturbation in OOB predictive accuracy through permutation. bootstrap constructs trees on multiple cores/processors simultaneously through parallel computation. Later, the get.mf.object function wraps the analysis output into mobforest.output object.

Predictive accuracy estimates are computed using pseudo-R2 metric, defined as the proportion of total variation in outcome variable explained by a tree model on out-of-bag cases. R2 ranges from 0 to 1. R2 of zero suggests worst tree model (in terms of predicting outcome) and R2 of 1 suggests perfect tree model.

References

Achim Zeileis, Torsten Hothorn, and Kurt Hornik (2008). Model-Based Recursive Partitioning. Journal of Computational and Graphical Statistics, 17(2), 492-514.

Hothorn, T., Hornik, K. and Zeileis, A. (2006) Unbiased recursive partitioning: A conditional inference framework, J Compute Graph Stat, 15, 651-674.

Strobl, C., Malley, J. and Tutz, G. (2009) An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests, Psychol Methods, 14, 323-348.

See Also

mobforest.control(), mobforest.output-class

Examples

Run this code
# NOT RUN {
library(mlbench)
set.seed(1111)
# Random Forest analysis of model based recursive partitioning load data
data("BostonHousing", package = "mlbench")
BostonHousing <- BostonHousing[1:90, c("rad", "tax", "crim", "medv", "lstat")]

# Recursive partitioning based on linear regression model medv ~ lstat with 3
# trees.  1 core/processor used. 
rfout <- mobforest.analysis(as.formula(medv ~ lstat), c("rad", "tax", "crim"),
    mobforest_controls = mobforest.control(ntree = 3, mtry = 2, replace = TRUE,
        alpha = 0.05, bonferroni = TRUE, minsplit = 25), data = BostonHousing,
    processors = 1, model = linearModel, seed = 1111)
# }
# NOT RUN {
rfout  
# }
# NOT RUN {
# }

Run the code above in your browser using DataCamp Workspace