CCI.test: Computational test for conditional independence based on ML and Monte Carlo Cross Validation

Description

The CCI.test function performs a conditional independence test using a specified machine learning model or a custom model provided by the user. It calculates the test statistic, generates a null distribution via permutations, computes p-values, and optionally generates a plot of the null distribution with the observed test statistic. The 'CCI.test' function serves as a wrapper around the 'perm.test' function

Usage

CCI.test(
  formula = NULL,
  data,
  p = 0.5,
  nperm = 160,
  nrounds = 600,
  mtry = NULL,
  metric = "Auto",
  method = "rf",
  choose_direction = FALSE,
  parametric = FALSE,
  poly = TRUE,
  degree = 3,
  robust = TRUE,
  subsample = "Auto",
  subsample_set,
  min_child_weight = 1,
  colsample_bytree = 1,
  eta = 0.3,
  gamma = 0,
  max_depth = 6,
  interaction = TRUE,
  mode = "numeric_only",
  metricfunc = NULL,
  mlfunc = NULL,
  tail = NA,
  tune = FALSE,
  samples = 35,
  folds = 5,
  tune_length = 10,
  k = 15,
  center = TRUE,
  scale = TRUE,
  eps = 1e-15,
  positive = NULL,
  kernel = "optimal",
  distance = 2,
  seed = NA,
  random_grid = TRUE,
  nthread = 2,
  verbose = FALSE,
  progress = TRUE,
  ...
)

Value

Invisibly returns the result of perm.test, which is an object of class 'CCI' containing the null distribution, observed test statistic, p-values, the machine learning model used, and the data.

Arguments

formula: Model formula specifying the relationship between dependent and independent variables. (Ex: Y ~ X | Z1 + Z2 for Y || X | Z1, Z2)
data: A data frame containing the variables specified in the formula.
p: Numeric. Proportion of data used for training the model. Default is 0.5.
nperm: Integer. The number of permutations to perform. Default is 60.
nrounds: Integer. The number of rounds (trees) for methods 'xgboost' and 'rf' Default is 600.
mtry: Number of variables to possibly split at in each node for method 'rf'. Default is NULL (sqrt of number of variables).
metric: Character. Specifies the type of data: "Auto", "RMSE" or "Kappa". Default is "Auto".
method: Character. Specifies the machine learning method to use. Supported methods are random forest "rf", extreme gradient boosting "xgboost", support vector machine 'svm' and K-nearest neighbour 'KNN'. Default is "rf".
choose_direction: Logical. If TRUE, the function will choose the best direction for testing. Default is FALSE.
parametric: Logical, indicating whether to compute a parametric p-value instead of the empirical p-value. A parametric p-value assumes that the null distribution is gaussian. Default is FALSE.
poly: Logical. If TRUE, polynomial terms of the conditional variables are included in the model. Default is TRUE.
degree: Integer. The degree of polynomial terms to include if poly is TRUE. Default is 3.
robust: Logical. If TRUE, uses a robust method for permutation. Default is TRUE.
subsample: Character. Specifies whether to use automatic subsampling based on sample size ("Auto"), user-defined subsampling ("Yes"), or no subsampling ("No"). Default is "Auto"
subsample_set: Numeric. If subsample is set to "Yes", this parameter defines the proportion of data to use for subsampling. Default is NA.
min_child_weight: Numeric. The minimum sum of instance weight (hessian) needed in a child for methods like xgboost. Default is 1.
colsample_bytree: Numeric. The subsample ratio of columns when constructing each tree for methods like xgboost. Default is 1.
eta: Numeric. The learning rate for methods like xgboost. Default is 0.3.
gamma: Numeric. The minimum loss reduction required to make a further partition on a leaf node of the tree for methods like xgboost. Default is 0.
max_depth: Integer. The maximum depth of the trees for methods like xgboost. Default is 6.
interaction: Logical. If TRUE, interaction terms of the conditional variables are included in the model. Default is TRUE.
mode: Character. Specifies the mode of operation: "numeric_only" or "mixed". Default is "numeric_only".
metricfunc: Optional the user can pass a custom function for calculating a performance metric based on the model's predictions. Default is NULL.
mlfunc: Optional the user can pass a custom machine learning wrapper function to use instead of the predefined methods. Default is NULL.
tail: Character. Specifies whether to calculate left-tailed or right-tailed p-values, depending on the performance metric used. Only applicable if using metricfunc or mlfunc. Default is NA.
tune: Logical. If TRUE, the function will perform hyperparameter tuning for the specified machine learning method. Default is FALSE.
samples: Integer. Number of hyperparameter combinations used in tuning. Default is 35.
folds: Integer. The number of folds for cross-validation during the tuning process. Default is 5.
tune_length: Integer. The number of parameter combinations to try during the tuning process. Default is 10.
k: Integer. The number of nearest neighbors to use for KNN method. Default is 15.
center: Logical. If TRUE, the data will be centered before fitting the model
scale: Logical. If TRUE, the data will be scaled before fitting the model. Default is TRUE.
eps: Numeric. A small value to avoid division by zero in some calculations.
positive: Character. The name of the positive class (KNN) in the data, used for classification tasks. Default is NULL.
kernel: Character. The kernel type to use for KNN method. Default is "optimal".
distance: Numeric. Parameter of Minkowski distance for the "KNN" method. Default is 2.
seed: Integer. Set the seed for reproducing results. Default is NA.
random_grid: Logical. If TRUE, a random grid search is performed. If FALSE, a full grid search is performed. Default is TRUE.
nthread: Integer. The number of threads to use for parallel processing. Default is 1.
verbose: Logical. If TRUE, additional information is printed during the execution of the function. Default is FALSE.
progress: Logical. If TRUE, a progress bar is displayed during the permutation process. Default is TRUE.
...: Additional arguments to pass to the perm.test function.

Examples

Run this code

set.seed(123)
data <- data.frame(x1 = stats::rnorm(100), x2 = stats::rnorm(100), y = stats::rnorm(100))
result <- CCI.test(y ~ x1 | x2, data = data, nperm = 25, interaction = FALSE)
summary(result)

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

See Also

Examples