The CCI.tuner function performs a grid search over parameters for a conditional independence test using machine learning model supported by CCI.test. The tuner use the caret package for tuning.
CCI.pretuner(
formula,
data,
method = "rf",
metric = "RMSE",
validation_method = "cv",
folds = 4,
training_share = 0.7,
tune_length = 4,
random_grid = TRUE,
samples = 35,
poly = TRUE,
degree = 3,
interaction = TRUE,
verboseIter = FALSE,
include_explanatory = FALSE,
verbose = FALSE,
parallel = FALSE,
mtry = 1:10,
nrounds = c(100, 200, 300, 400, 500, 600, 700, 800, 900, 1000),
eta = seq(0.01, 0.3, by = 0.05),
max_depth = 2:6,
gamma = c(0, 1, 2, 3),
colsample_bytree = c(0.8, 0.9, 1),
min_child_weight = c(1, 3),
subsample = 1,
sigma = seq(0.1, 2, by = 0.3),
C = seq(0.1, 2, by = 0.5),
...
)A list containing:
best_param: A data frame with the best parameters.
tuning_result: A data frame with all tested parameter combinations and their performance metrics.
warnings: A character vector of warnings issued during tuning.
Model formula specifying the relationship between dependent and independent variables.
A data frame containing the variables specified in the formula.
Character. Specifies the machine learning method to use. Supported methods are random forest "rf", extreme gradient boosting "xgboost" and Support Vector Machine "svm".
Character. The performance metric to optimize during tuning. Default is "RMSE".
Character. Specifies the resampling method. Default is "cv".
Integer. The number of folds for cross-validation during the tuning process. Default is 10.
Numeric. For leave-group out cross-validation: the training percentage. Default is 0.7.
Integer. The number of parameter combinations to try during the tuning process. Default is 10.
Logical. If TRUE, a random grid search is performed. If FALSE, a full grid search is performed. Default is TRUE.
Integer. The number of random samples to take from the grid. Default is 30.
Logical. If TRUE, polynomial terms of the conditional variables are included in the model. Default is TRUE.
Integer. The degree of polynomial terms to include if poly is TRUE. Default is 3.
Logical. If TRUE, interaction terms of the conditional variables are included in the model. Default is TRUE.
Logical. If TRUE, the function will print the tuning process. Default is FALSE.
Logical. If TRUE, given the condition Y || X | Z, the function will include explanatory variable X in the model for Y. Default is FALSE
Logical. If TRUE, the function will print the tuning process. Default is FALSE..
Logical. If TRUE, the function will use parallel processing. Default is TRUE.
Integer. The number of variables randomly sampled as candidates at each split for random forest. Default is 1:5.
Integer. The number of rounds (trees) for methods such as xgboost and random forest. Default is seq(50, 200, by = 25).
Numeric. The learning rate for xgboost. Default is seq(0.01, 0.3, by = 0.05).
Integer. The maximum depth of the tree for xgboost. Default is 1:6.
Numeric. The minimum loss reduction required to make a further partition on a leaf node for xgboost. Default is seq(0, 5, by = 1).
Numeric. The subsample ratio of columns when constructing each tree for xgboost. Default is seq(0.5, 1, by = 0.1).
Integer. The minimum sum of instance weight (hessian) needed in a child for xgboost. Default is 1:5.
Numeric. The subsample ratio of the training. Default is 1.
Numeric. The standard deviation of the Gaussian kernel for Gaussian Process Regression. Default is seq(0.1, 2, by = 0.3).
Numeric. The regularization parameter for Support Vector Machine. Default is seq(0.1, 2, by = 0.5).
Additional arguments to pass to the CCI.tuner function.
CCI.test perm.test, print.summary.CCI, plot.CCI, QQplot
set.seed(123)
data <- data.frame(x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100), y = rnorm(100))
# Tune random forest parameters
result <- CCI.pretuner(formula = y ~ x1 | x2 + x3,
data = data,
samples = 5,
folds = 3,
method = "rf")
Run the code above in your browser using DataLab