Trains a model on training datasets. Predicts the risk score for all the
training & datasets, independently. This function also predicts the risk
score for combined training datasets cohort and validation datasets cohort.
The risk score estimation is done by multivariate models fit by
fit.survivalmodel
. The function also predicts risk scores for each of
the top.n.features
independently.
create.classifier.multivariate(
data.directory = ".",
output.directory = ".",
feature.selection.datasets = NULL,
feature.selection.p.threshold = 0.05,
training.datasets = NULL,
validation.datasets = NULL,
top.n.features = 25,
models = c("1", "2", "3"),
learning.algorithms = c("backward", "forward"),
alpha.glm = c(1),
k.fold.glm = 10,
seed.value = 51214,
cores.glm = 1,
rf.ntree = 1000,
rf.mtry = NULL,
rf.nodesize = 15,
rf.samptype = "swor",
rf.sampsize = function(x) { x * 0.66 },
...
)
Path to the directory containing datasets as specified
by feature.selection.datasets
, training.datasets
,
validation.datasets
Path to the output folder where intermediate and results files will be saved
A vector containing names of datasets used
for feature selection in function derive.network.features()
One of the P values that were used for
feature selection in function derive.network.features()
. This
function does not support vector of P values as used in
derive.network.features()
for performance reasons
A vector containing names of training datasets
A vector containing names of validation datasets
A numeric value specifying how many top ranked features will be used for univariate survival modelling
A character vector specifying which of the models ('1' = N+E, '2' = N, '3' = E) to run
A character vector specifying which learning algorithm to be used for model fitting and feature selection. Defaults to c('backward', 'forward'). Available options are: c('backward', 'forward', 'glm', 'randomforest')
A numeric vector specifying elastic-net mixing parameter alpha, with range alpha raning from [0,1]. 1 for LASSO (default) and 0 for ridge. For multiple values of alpha, most optimal value is selected through cross validation on training set
A numeric value specifying k-fold cross validation if glm
was chosen in learning.algorithms
A numeric value specifying seed for glm k-fold cross or random forest
validation if glm was chosen in learning.algorithms
An integer value specifying number of cores to be used for
glm if it was chosen in learning.algorithms
An integer value specifying the number of trees in random forest. Defaults to 1000. This should be tuned after starting with a large forest such as 1000 in the initial run and assessing the results in output\/OOB_error__TRAINING_* to see where the OOB error rate stablises, and then rerunning with the stablised rf.ntree parameter
An integer value specifying the number of variables randomly selected
for splitting a node. Defaults to sqrt(features), which is the same as in the
underlying R package random survival forest randomForestSRC::rfsrc
An integer value specifying number of unique cases in a terminal
node. Defaults to 15, which is the same as in the underlying R package random survival
forest randomForestSRC::rfsrc
An character string specifying name of sampling. Defaults to sampling without replacement 'swor'. Available options are: c('swor', 'swr')
A function specifying sampling size when rf.samptype
is set
to sampling without replacement ('swor'). Defaults to 66%: function(x){x * .66}
other params to be passed on to the random forest call to the underlying
R package random survival forest randomForestSRC::rfsrc
The output files are stored under output.directory
/output/
# NOT RUN {
# see package's main documentation
# }
Run the code above in your browser using DataLab