Learn R Programming

DynForest (version 1.1.3)

DynForest: Random forest with multivariate longitudinal endogenous covariates

Description

Build a random forest using multivariate longitudinal endogenous covariates

Usage

DynForest(
  timeData = NULL,
  fixedData = NULL,
  idVar = NULL,
  timeVar = NULL,
  timeVarModel = NULL,
  Y = NULL,
  ntree = 200,
  mtry = NULL,
  nodesize = 1,
  minsplit = 2,
  cause = 1,
  nsplit_option = "quantile",
  ncores = NULL,
  seed = 1234,
  verbose = TRUE
)

Value

DynForest function returns a list with the following elements:

dataA list containing the data used to grow the trees
rfA table with each tree in column. Provide multiple characteristics about the tree building
typeOutcome type
timesA numeric vector containing the time-to-event for all subjects
causeIndicating the cause of interest
causesA numeric vector containing the causes indicator
InputsA list of 3 elements: Longitudinal, Numeric and Factor. Each element contains the names of the predictors
Longitudinal.modelA list of longitudinal markers containing the formula used for modeling in the random forest
paramA list containing the hyperparameters
comput.timeComputation time

Arguments

timeData

A data.frame containing the id and time measurements variables and the time-dependent predictors.

fixedData

A data.frame containing the id variable and the time-fixed predictors. Categorical variables should be characterized as factor.

idVar

A character indicating the name of variable to identify the subjects

timeVar

A character indicating the name of time variable

timeVarModel

A list for each time-dependent predictors containing a list of formula for fixed and random part from the mixed model

Y

A list of output which should contain: type defines the nature of the outcome, can be "surv", "numeric" or "factor"; .

ntree

Number of trees to grow. Default value set to 200.

mtry

Number of candidate variables randomly drawn at each node of the trees. This parameter should be tuned by minimizing the OOB error. Default is defined as the square root of the number of predictors.

nodesize

Minimal number of subjects required in both child nodes to split. Cannot be smaller than 1.

minsplit

(Only with survival outcome) Minimal number of events required to split the node. Cannot be smaller than 2.

cause

(Only with competing events) Number indicates the event of interest.

nsplit_option

A character indicates how the values are chosen to build the two groups for the splitting rule (only for continuous predictors). Values are chosen using deciles (nsplit_option="quantile") or randomly (nsplit_option="sample"). Default value is "quantile".

ncores

Number of cores used to grow trees in parallel. Default value is the number of cores of the computer-1.

seed

Seed to replicate results

verbose

A logical controlling the function progress. Default is TRUE

Author

Anthony Devaux (anthony.devauxbarault@gmail.com)

Details

The function currently supports survival (competing or single event), continuous or categorical outcome.

FUTUR IMPLEMENTATIONS:

  • Continuous longitudinal outcome

  • Functional data analysis

References

  • Devaux A., Helmer C., Genuer R., Proust-Lima C. (2023). Random survival forests with multivariate longitudinal endogenous covariates. SMMR doi:10.1177/09622802231206477

  • Devaux A., Proust-Lima C., Genuer R. (2023). Random Forests for time-fixed and time-dependent predictors: The DynForest R package. arXiv doi:10.48550/arXiv.2302.02670

See Also

summary.DynForest compute_OOBerror compute_VIMP compute_gVIMP predict.DynForest plot.DynForest

Examples

Run this code
# \donttest{
data(pbc2)

# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)

# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)

pbc2_train <- pbc2[id_row,]

timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]

# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))

# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])

# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))

# Run DynForest function
res_dyn <- DynForest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)
# }

Run the code above in your browser using DataLab