Learn R Programming

Description

DynForest is a R package aiming to predict an outcome using multivariate longitudinal predictors. The method is based on random forest principle where the longitudinal predictors are modeled through the random forest. DynForest currently supports continuous, categorical and survival outcome. The methodology is fully described for a survival outcome in the paper:

Devaux A., Helmer C., Genuer R. and Proust-Lima C. (2023). Random survival forests with multivariate longitudinal endogenous covariates. Statistical Methods in Medical Research. <doi:10.1177/09622802231206477>

DynForest user guide is also available in the paper:

Devaux A., Proust-Lima C. and Genuer R. (2023). Random Forests for time-fixed and time-dependent predictors: The DynForest R package. arXiv. <doi:10.48550/arXiv.2302.02670>

Installation

DynForest package version 1.1.2 could be install from the CRAN with:

install.packages("DynForest")

Development version of DynForest is also available from GitHub with:

# install.packages("devtools")
devtools::install_github("anthonydevaux/DynForest")

Quick example with survival outcome

Manage data

library(DynForest)
#> Registered S3 method overwritten by 'cmprsk':
#>   method      from
#>   plot.cuminc lcmm
data(pbc2)

# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)

Build DynForest objects

# Build longitudinal data
timeData <- pbc2[,c("id","time",
                    "serBilir","SGOT",
                    "albumin","alkaline")]

# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))
# Build fixed data
fixedData <- unique(pbc2[,c("id","age","drug","sex")])

# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2[,c("id","years","event")]))

Run DynForest() function

# Run DynForest function
res_dyn <- DynForest(timeData = timeData, fixedData = fixedData,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 15, seed = 1234)

Get summary

summary(res_dyn)
#> DynForest executed for survival (competing risk) outcome 
#>  Splitting rule: Fine & Gray statistic test 
#>  Out-of-bag error type: Integrated Brier Score 
#>  Leaf statistic: Cumulative incidence function 
#> ---------------- 
#> Input 
#>  Number of subjects: 312 
#>  Longitudinal: 4 predictor(s) 
#>  Numeric: 1 predictor(s) 
#>  Factor: 2 predictor(s) 
#> ---------------- 
#> Tuning parameters 
#>  mtry: 3 
#>  nodesize: 5 
#>  minsplit: 5 
#>  ntree: 50 
#> ---------------- 
#> ---------------- 
#> DynForest summary 
#>  Average depth per tree: 5.94 
#>  Average number of leaves per tree: 20.44 
#>  Average number of subjects per leaf: 9.67 
#>  Average number of events of interest per leaf: 4.34 
#> ---------------- 
#> Computation time 
#>  Number of cores used: 15 
#>  Time difference of 1.04619 mins
#> ----------------

Copy Link

Version

Install

install.packages('DynForest')

Monthly Downloads

533

Version

1.1.3

License

LGPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

Anthony Devaux

Last Published

March 22nd, 2024

Functions in DynForest (1.1.3)

getTree

Extract some information about the split for a tree by user
getTreeNodes

Extract nodes identifiers for a given tree
data_simu1

data_simu1 dataset
print.DynForest

Print function
plot.DynForest

Plot function in DynForest
predict.DynForest

Prediction using dynamic random forests
var_depth

Extract characteristics from the trees building process
var_split_factor

Split function to build the two daughter nodes from factor predictor
pred.MMT

Predict the leaf by dropping down the subject in the tree
impurity_split

Impurity Split
impurity

Compute the impurity of a given vector
predRE

Function to compute individual random effects using hlme output parameters
getParamMM

Function to update the list of parameters for each marker using those estimated from previous node
summary.DynForest

Display the summary of DynForest
var_split_num

Split function to build the two daughter nodes from numeric predictors
rf_shape_para

Paralleled random survival Forest using multivariate longitudinal endogenous covariates
var_split_long

Split function to build the two daughter nodes from longitudinal predictors
checking

Internal checking function
OOB.rfshape

Compute the Out-Of-Bag error on the random survival forest
data_simu2

data_simu1 dataset
compute_OOBerror

Compute the Out-Of-Bag error (OOB error)
OOB.tree

Compute Out-Of-Bag error on the tree
combine_times

Extend predictions for new times
DynForest

Random forest with multivariate longitudinal endogenous covariates
DynTree_surv

Grow random survival tree using multivariate longitudinal endogenous covariates
Fact.partitions

Factor partitions finder
DynTree

Grow random survival tree using multivariate longitudinal endogenous covariates
compute_VIMP

Compute the importance of variables (VIMP) statistic
pbc2

pbc2 dataset
compute_gVIMP

Compute the grouped importance of variables (gVIMP) statistic