Learn R Programming

fuzzyforest (version 1.0.0)

wff: Fits WGCNA based fuzzy forest algorithm.

Description

Fits fuzzy forest algorithm using WGCNA. Returns fuzzy forest object.

Usage

wff(X, y, Z = NULL, WGCNA_params = WGCNA_control(power = 6),
  screen_params = screen_control(min_ntree = 5000),
  select_params = select_control(min_ntree = 5000), final_ntree = 500,
  num_processors = 1, nodesize, test_features = NULL, test_y = NULL)

Arguments

X
A data.frame. Each column corresponds to a feature vector. WGCNA will be used to cluster the features in X. As a result, the features should be all be numeric. Non-numeric features may be input via Z.
y
Response vector. For classification, y should be a factor. For regression, y should be numeric.
Z
Additional features that are not to be screened out at the screening step. WGCNA is not carried out on features in Z.
WGCNA_params
Parameters for WGCNA. See blockwiseModules and WGCNA_control for details. WGCNA_params is an object of type
screen_params
Parameters for screening step of fuzzy forests. See screen_control for details. screen_params is an object of type screen_control.
select_params
Parameters for selection step of fuzzy forests. See select_control for details. select_params is an object of type select_control.
final_ntree
Number of trees grown in the final random forest. This random forest contains all selected features.
num_processors
Number of processors used to fit random forests.
nodesize
Minimum terminal nodesize. 1 if classification. 5 if regression. If the sample size is very large, the trees will be grown extremely deep. This may lead to issues with memory usage and may lead to significant increases in the time it takes the algorithm
test_features
A data.frame containing features from a test set. The data.frame should contain the features in both X and Z.
test_y
The responses for the test set.

Value

  • An object of type fuzzy_forest. This object is a list containing useful output of fuzzy forests. In particular it contains a data.frame with list of selected features. It also includes the random forest fit using the selected features.

References

Leo Breiman (2001). Random Forests. Machine Learning, 45(1), 5-32.

Daniel Conn, Tuck Ngun, Christina M. Ramirez (2015). Fuzzy Forests: a New WGCNA Based Random Forest Algorithm for Correlated, High-Dimensional Data, Journal of Statistical Software, Manuscript in progress.

Bin Zhang and Steve Horvath (2005) "A General Framework for Weighted Gene Co-Expression Network Analysis", Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1, Article 17

Examples

Run this code
library(WGCNA)
library(randomForest)
library(fuzzyforest)
data(ctg)
y <- ctg$NSP
X <- ctg[, 2:22]
WGCNA_params <- WGCNA_control(p = 6, minModuleSize = 1, nThreads = 1)
mtry_factor <- 1; min_ntree <- 500;  drop_fraction <- .5; ntree_factor <- 1
screen_params <- screen_control(drop_fraction = drop_fraction,
                                keep_fraction = .25, min_ntree = min_ntree,
                                ntree_factor = ntree_factor,
                                mtry_factor = mtry_factor)
select_params <- select_control(drop_fraction = drop_fraction,
                                number_selected = 5,
                                min_ntree = min_ntree,
                                ntree_factor = ntree_factor,
                                mtry_factor = mtry_factor)
wff_fit <- wff(X, y, WGCNA_params = WGCNA_params,
                screen_params = screen_params,
                select_params = select_params,
                final_ntree = 500)

#extract variable importance rankings
vims <- wff_fit$feature_list

#plot results
modplot(wff_fit)

Run the code above in your browser using DataLab