randomVarImpsRFplot: Plot random random variable importances

Description

Plot variable importances from random permutations of class labels and the variable importances from the original data set.

Usage

randomVarImpsRFplot(randomImportances, forest,
                    whichImp = "impsUnscaled", nvars = NULL,
                    show.var.names = FALSE, vars.highlight = NULL,
                    main = NULL, screeRandom = TRUE,
                    lwdBlack = 1.5,
                    lwdRed = 2,
                    lwdLightblue = 1,
                    cexPoint = 1,
                    overlayTrue = FALSE,
                    xlab = NULL,
                    ylab = NULL, ...)

Arguments

randomImportances

A list with a structure such as the object return by randomVarImpsRF

forest

A random forest fitted to the original data. This forest must have been fitted with importances = TRUE.

whichImp

The importance measue to use. One (only one) of impsUnscaled, impsScaled, impsGini, that correspond, respectively, to the (unscaled) mean decrease in accuracy, the scaled mean decrease in accuracy, and the Gini index. See below and randomForest, importance and the references for further explanations of the measures of variable importance.

nvars

If NULL will show the plot for the complete range of variables. If an integer, will plot only the most important nvars.

show.var.names

If TRUE, show the variable names in the plot. Unless you are plotting few variables, it probably won't be of any use.

vars.highlight

A vector indicating the variables to highlight in the plot with a vertical blue segment. You need to pass here a vector of variable names, not variable positions.

main

The title for the plot.

screeRandom

If TRUE, order all the variable importances (i.e., those from both the original and the permuted class labels data sets) from largest to smallest before plotting. The plot will thus resemble a usual "scree plot".

lwdBlack

The width of the line to use for the importances from the original data set.

lwdRed

The width of the line to use for the average of the importances for the permuted data sets.

lwdLightblue

The width of the line for the importances for the individual permuted data sets.

cexPoint

cex argument for the points for the importances from the original data set.

overlayTrue

If TRUE, the variable importance from the original data set will be plotted last, so you can see it even if buried in the middle of many gree lines; can be of help when the plot does not allow you to see the black line.

xlab

The title for the x-axis (see xlab).

ylab

The title for the y-axis (see ylab).

...

Additional arguments to plot.

Value

Only used for its side effects of producing plots. In particular, you will see lines of three colors:

black

Connects the variable importances from the original simulated data.

green

Connect the variable importances from the data sets with permuted class labels; there will be as many lines as numrandom where used when randomVarImpsRF was called to obtain the random importances.

red

Connects the average of the importances from the permuted data sets.

Additionally, if you used a valid set of values for vars.highlight, these will be shown with a vertical blue segment.

References

Breiman, L. (2001) Random forests. Machine Learning, 45, 5--32.

Diaz-Uriarte, R. , Alvarez de Andres, S. (2005) Variable selection from random forests: application to gene expression data. Tech. report. http://ligarto.org/rdiaz/Papers/rfVS/randomForestVarSel.html

Friedman, J., Meulman, J. (2005) Clustering objects on subsets of attributes (with discussion). J. Royal Statistical Society, Series B, 66, 815--850.

Examples

Run this code

# NOT RUN {
x <- matrix(rnorm(45 * 30), ncol = 30)
x[1:20, 1:2] <- x[1:20, 1:2] + 2
colnames(x) <- paste0("V", seq.int(ncol(x)))
cl <- factor(c(rep("A", 20), rep("B", 25)))  

rf <- randomForest(x, cl, ntree = 200, importance = TRUE)
rf.rvi <- randomVarImpsRF(x, cl, 
                          rf, 
                          numrandom = 20, 
                          usingCluster = FALSE) 

randomVarImpsRFplot(rf.rvi, rf)
op <- par(las = 2)
randomVarImpsRFplot(rf.rvi, rf, show.var.names = TRUE)
par(op)


# }
# NOT RUN {
## identical, but using a cluster
## make a small cluster, for the sake of illustration
psockCL <- makeCluster(2, "PSOCK")
clusterSetRNGStream(psockCL, iseed = 789)
clusterEvalQ(psockCL, library(varSelRF))

rf.rvi <- randomVarImpsRF(x, cl, 
                          rf, 
                          numrandom = 20, 
                          usingCluster = TRUE,
                          TheCluster = psockCL) 

randomVarImpsRFplot(rf.rvi, rf)
stopCluster(psockCL)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab