Learn R Programming

e2tree (version 0.2.0)

createDisMatrix: Dissimilarity matrix

Description

The function createDisMatrix creates a dissimilarity matrix among observations from an ensemble tree.

Usage

createDisMatrix(
  ensemble,
  data,
  label,
  parallel = list(active = FALSE, no_cores = 1),
  verbose = FALSE
)

Value

A dissimilarity matrix. This is a dissimilarity matrix measuring the discordance between two observations concerning a given random forest model.

Arguments

ensemble

is an ensemble tree object

data

is a data frame containing the variables in the model. It is the data frame used for ensemble learning.

label

is a character. It indicates the response label.

parallel

A list with two elements: active (logical) and no_cores (integer). If active = TRUE, the function performs parallel computation using the number of cores specified in no_cores. If no_cores is NULL or equal to 0, it defaults to using all available cores minus one. If active = FALSE, the function runs on a single core. Default: list(active = FALSE, no_cores = 1).

verbose

Logical. If TRUE, the function prints progress messages and other information during execution. If FALSE (the default), messages are suppressed.

Details

An ensemble is a trained object of one of these classes trained for classification or regression task:

  • randomForest

  • ranger

Examples

Run this code
# \donttest{
## Classification
data("iris")

# Create training and validation set:
smp_size <- floor(0.75 * nrow(iris))
train_ind <- sample(seq_len(nrow(iris)), size = smp_size)
training <- iris[train_ind, ]
validation <- iris[-train_ind, ]
response_training <- training[,5]
response_validation <- validation[,5]

# Perform training:
## "randomForest" package
ensemble <- randomForest::randomForest(Species ~ ., data=training, 
importance=TRUE, proximity=TRUE)

## "ranger" package
ensemble <- ranger::ranger(Species ~ ., data = iris, 
num.trees = 1000, importance = 'impurity')

D <- createDisMatrix(ensemble, data=training,
                     label = "Species",
                     parallel = list(active=FALSE, no_cores = 1))


## Regression
data("mtcars")

# Create training and validation set:
smp_size <- floor(0.75 * nrow(mtcars))
train_ind <- sample(seq_len(nrow(mtcars)), size = smp_size)
training <- mtcars[train_ind, ]
validation <- mtcars[-train_ind, ]
response_training <- training[,1]
response_validation <- validation[,1]

# Perform training
## "randomForest" package
ensemble = randomForest::randomForest(mpg ~ ., data=training, ntree=1000, 
importance=TRUE, proximity=TRUE)

## "ranger" package
ensemble <- ranger::ranger(formula = mpg ~ ., data = training, 
num.trees = 1000, importance = "permutation")

D = createDisMatrix(ensemble, data=training,
                        label = "mpg",
                       parallel = list(active=FALSE, no_cores = 1))

# }

Run the code above in your browser using DataLab