Learn R Programming

wsrf (version 1.3.0)

wsrf: Build a Forest of Weighted Subspace Decision Trees

Description

Build weighted subspace decision trees to construct a forest.

Usage

wsrf(formula, data, nvars, mtry, ntrees=500, weights=TRUE, 
                 parallel=TRUE, na.action=na.fail)

Arguments

formula
a formula, with a response but no interaction terms.
data
a data frame in which to interpret the variables named in the formula.
ntrees
number of trees to build on each server; By default, 500
nvars, mtry
number of variables to choose, with Breiman's default for random forests being the integer less than or equal to $log_2(ninputs) + 1$. For compatibility with other R packages like randomForest, both nvars
weights
logical. TRUE for weighted subspace selection, which is the default; FALSE for random selection.
na.action
indicate the behaviour when encountering NA values in data.
parallel
whether to run multiple cores (TRUE), nodes, or sequentially (FALSE).

Value

  • An object of class wsrf.

concept

  • weighted subspace decision trees
  • weighted subspace random forest

Details

See Xu, Huang, Williams, Wang, and Ye (2012) for details

References

Xu B, Huang JZ, Williams G, Wang Q, Ye Y (2012). "Classifying very high-dimensional data with random forests built from small subspaces." International Journal of Data Warehousing and Mining (IJDWM), 8(2), 44-63.

Examples

Run this code
library(wsrf)
    library(rattle)
    library(randomForest)
  
  # prepare parameters
  ds <- get("weather")
  dim(ds)
  names(ds)
  target <- "RainTomorrow"
  id     <- c("Date", "Location")
  risk   <- "RISK_MM"
  ignore <- c(id, if (exists("risk")) risk) 
  vars   <- setdiff(names(ds), ignore)
  if (sum(is.na(ds[vars]))) ds[vars] <- na.roughfix(ds[vars])
  ds[target] <- as.factor(ds[[target]])
  (tt <- table(ds[target]))
  form <- as.formula(paste(target, "~ ."))
  set.seed(42)
  train <- sample(nrow(ds), 0.7*nrow(ds))
  test <- setdiff(seq_len(nrow(ds)), train)
  
  # build model
  model.wsrf.1 <- wsrf(form, data=ds[train, vars])
  
  # view model
  print(model.wsrf.1, tree=1)
  summary(model.wsrf.1)
  summary(model.wsrf.1, tree=c(1,500))
  
  # evaluate
  strength(model.wsrf.1)
  correlation(model.wsrf.1)
  cl <- predict(model.wsrf.1, newdata=ds[test, vars], type="response")
  actual <- ds[test, target]
  (accuracy.wsrf <- sum(cl==actual)/length(actual))

Run the code above in your browser using DataLab