The CCI.test function performs a conditional independence test using a specified machine learning model or a custom model provided by the user. It calculates the test statistic, generates a null distribution via permutations, computes p-values, and optionally generates a plot of the null distribution with the observed test statistic.
The 'CCI.test' function serves as a wrapper around the 'perm.test' function
CCI.test(
formula = NULL,
data,
p = 0.5,
nperm = 160,
nrounds = 600,
mtry = NULL,
metric = "Auto",
method = "rf",
choose_direction = FALSE,
parametric = FALSE,
poly = TRUE,
degree = 3,
robust = TRUE,
subsample = "Auto",
subsample_set,
min_child_weight = 1,
colsample_bytree = 1,
eta = 0.3,
gamma = 0,
max_depth = 6,
interaction = TRUE,
mode = "numeric_only",
metricfunc = NULL,
mlfunc = NULL,
tail = NA,
tune = FALSE,
samples = 35,
folds = 5,
tune_length = 10,
k = 15,
center = TRUE,
scale = TRUE,
eps = 1e-15,
positive = NULL,
kernel = "optimal",
distance = 2,
seed = NA,
random_grid = TRUE,
nthread = 2,
verbose = FALSE,
progress = TRUE,
...
)Invisibly returns the result of perm.test, which is an object of class 'CCI' containing the null distribution, observed test statistic, p-values, the machine learning model used, and the data.
Model formula specifying the relationship between dependent and independent variables. (Ex: Y ~ X | Z1 + Z2 for Y || X | Z1, Z2)
A data frame containing the variables specified in the formula.
Numeric. Proportion of data used for training the model. Default is 0.5.
Integer. The number of permutations to perform. Default is 60.
Integer. The number of rounds (trees) for methods 'xgboost' and 'rf' Default is 600.
Number of variables to possibly split at in each node for method 'rf'. Default is NULL (sqrt of number of variables).
Character. Specifies the type of data: "Auto", "RMSE" or "Kappa". Default is "Auto".
Character. Specifies the machine learning method to use. Supported methods are random forest "rf", extreme gradient boosting "xgboost", support vector machine 'svm' and K-nearest neighbour 'KNN'. Default is "rf".
Logical. If TRUE, the function will choose the best direction for testing. Default is FALSE.
Logical, indicating whether to compute a parametric p-value instead of the empirical p-value. A parametric p-value assumes that the null distribution is gaussian. Default is FALSE.
Logical. If TRUE, polynomial terms of the conditional variables are included in the model. Default is TRUE.
Integer. The degree of polynomial terms to include if poly is TRUE. Default is 3.
Logical. If TRUE, uses a robust method for permutation. Default is TRUE.
Character. Specifies whether to use automatic subsampling based on sample size ("Auto"), user-defined subsampling ("Yes"), or no subsampling ("No"). Default is "Auto"
Numeric. If subsample is set to "Yes", this parameter defines the proportion of data to use for subsampling. Default is NA.
Numeric. The minimum sum of instance weight (hessian) needed in a child for methods like xgboost. Default is 1.
Numeric. The subsample ratio of columns when constructing each tree for methods like xgboost. Default is 1.
Numeric. The learning rate for methods like xgboost. Default is 0.3.
Numeric. The minimum loss reduction required to make a further partition on a leaf node of the tree for methods like xgboost. Default is 0.
Integer. The maximum depth of the trees for methods like xgboost. Default is 6.
Logical. If TRUE, interaction terms of the conditional variables are included in the model. Default is TRUE.
Character. Specifies the mode of operation: "numeric_only" or "mixed". Default is "numeric_only".
Optional the user can pass a custom function for calculating a performance metric based on the model's predictions. Default is NULL.
Optional the user can pass a custom machine learning wrapper function to use instead of the predefined methods. Default is NULL.
Character. Specifies whether to calculate left-tailed or right-tailed p-values, depending on the performance metric used. Only applicable if using metricfunc or mlfunc. Default is NA.
Logical. If TRUE, the function will perform hyperparameter tuning for the specified machine learning method. Default is FALSE.
Integer. Number of hyperparameter combinations used in tuning. Default is 35.
Integer. The number of folds for cross-validation during the tuning process. Default is 5.
Integer. The number of parameter combinations to try during the tuning process. Default is 10.
Integer. The number of nearest neighbors to use for KNN method. Default is 15.
Logical. If TRUE, the data will be centered before fitting the model
Logical. If TRUE, the data will be scaled before fitting the model. Default is TRUE.
Numeric. A small value to avoid division by zero in some calculations.
Character. The name of the positive class (KNN) in the data, used for classification tasks. Default is NULL.
Character. The kernel type to use for KNN method. Default is "optimal".
Numeric. Parameter of Minkowski distance for the "KNN" method. Default is 2.
Integer. Set the seed for reproducing results. Default is NA.
Logical. If TRUE, a random grid search is performed. If FALSE, a full grid search is performed. Default is TRUE.
Integer. The number of threads to use for parallel processing. Default is 1.
Logical. If TRUE, additional information is printed during the execution of the function. Default is FALSE.
Logical. If TRUE, a progress bar is displayed during the permutation process. Default is TRUE.
Additional arguments to pass to the perm.test function.
perm.test, print.summary.CCI, plot.CCI, CCI.pretuner, QQplot
set.seed(123)
data <- data.frame(x1 = stats::rnorm(100), x2 = stats::rnorm(100), y = stats::rnorm(100))
result <- CCI.test(y ~ x1 | x2, data = data, nperm = 25, interaction = FALSE)
summary(result)
Run the code above in your browser using DataLab