bigrf (version 0.1-12)

generateSyntheticClass: Generate Synthetic Second Class for Unsupervised Learning

Description

To use Random Forests for unsupervised learning, the training set x is treated as a single class. This function creates a synthetic second class for classification by sampling at random from the univariate distributions of the original data. This is useful, for example, for clustering.

Usage

generateSyntheticClass(x, ...)

Arguments

x
A big.matrix, matrix or data.frame containing the predictor variables of the original training set.
...
If x is a big.matrix, these arguments will be passed on to big.matrix to control how the big.matrix for the two-class training set is created.

Value

A list containing the following components:
x
The two-class training set, comprising the original training set and the synthesized second class. It will be an object of the same type as the argument x.
y
A factor vector that labels the two classes in x.

References

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

Breiman, L. & Cutler, A. (n.d.). Random Forests. Retrieved from http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm.

Examples

Run this code
# Perform unsupervised learning on the Cars93 data set.

# Load data.
data(Cars93, package="MASS")

# Create second synthetic class for unsupervised learning.
newdata <- generateSyntheticClass(Cars93)

# Select variables with which to train model.
vars <- c(4:22)

# Run model, grow 30 trees.
forest <- bigrfc(newdata$x, newdata$y, ntree=30L, varselect=vars,
                 cachepath=NULL)

Run the code above in your browser using DataLab