Learn R Programming

CCI (version 0.3.6)

CCI.test: Computational test for conditional independence based on ML and Monte Carlo Cross Validation

Description

The CCI.test function performs a conditional independence test using a specified machine learning model or a custom model provided by the user. It calculates the test statistic, generates a null distribution via permutations, computes p-values, and optionally generates a plot of the null distribution with the observed test statistic. The 'CCI.test' function serves as a wrapper around the 'perm.test' function

Usage

CCI.test(
  formula = NULL,
  data,
  p = 0.5,
  nperm = 160,
  nrounds = 600,
  mtry = NULL,
  metric = "Auto",
  method = "rf",
  choose_direction = FALSE,
  parametric = FALSE,
  poly = TRUE,
  degree = 3,
  robust = TRUE,
  subsample = "Auto",
  subsample_set,
  min_child_weight = 1,
  colsample_bytree = 1,
  eta = 0.3,
  gamma = 0,
  max_depth = 6,
  interaction = TRUE,
  mode = "numeric_only",
  metricfunc = NULL,
  mlfunc = NULL,
  tail = NA,
  tune = FALSE,
  samples = 35,
  folds = 5,
  tune_length = 10,
  k = 15,
  center = TRUE,
  scale = TRUE,
  eps = 1e-15,
  positive = NULL,
  kernel = "optimal",
  distance = 2,
  seed = NA,
  random_grid = TRUE,
  nthread = 2,
  verbose = FALSE,
  progress = TRUE,
  ...
)

Value

Invisibly returns the result of perm.test, which is an object of class 'CCI' containing the null distribution, observed test statistic, p-values, the machine learning model used, and the data.

Arguments

formula

Model formula specifying the relationship between dependent and independent variables. (Ex: Y ~ X | Z1 + Z2 for Y || X | Z1, Z2)

data

A data frame containing the variables specified in the formula.

p

Numeric. Proportion of data used for training the model. Default is 0.5.

nperm

Integer. The number of permutations to perform. Default is 60.

nrounds

Integer. The number of rounds (trees) for methods 'xgboost' and 'rf' Default is 600.

mtry

Number of variables to possibly split at in each node for method 'rf'. Default is NULL (sqrt of number of variables).

metric

Character. Specifies the type of data: "Auto", "RMSE" or "Kappa". Default is "Auto".

method

Character. Specifies the machine learning method to use. Supported methods are random forest "rf", extreme gradient boosting "xgboost", support vector machine 'svm' and K-nearest neighbour 'KNN'. Default is "rf".

choose_direction

Logical. If TRUE, the function will choose the best direction for testing. Default is FALSE.

parametric

Logical, indicating whether to compute a parametric p-value instead of the empirical p-value. A parametric p-value assumes that the null distribution is gaussian. Default is FALSE.

poly

Logical. If TRUE, polynomial terms of the conditional variables are included in the model. Default is TRUE.

degree

Integer. The degree of polynomial terms to include if poly is TRUE. Default is 3.

robust

Logical. If TRUE, uses a robust method for permutation. Default is TRUE.

subsample

Character. Specifies whether to use automatic subsampling based on sample size ("Auto"), user-defined subsampling ("Yes"), or no subsampling ("No"). Default is "Auto"

subsample_set

Numeric. If subsample is set to "Yes", this parameter defines the proportion of data to use for subsampling. Default is NA.

min_child_weight

Numeric. The minimum sum of instance weight (hessian) needed in a child for methods like xgboost. Default is 1.

colsample_bytree

Numeric. The subsample ratio of columns when constructing each tree for methods like xgboost. Default is 1.

eta

Numeric. The learning rate for methods like xgboost. Default is 0.3.

gamma

Numeric. The minimum loss reduction required to make a further partition on a leaf node of the tree for methods like xgboost. Default is 0.

max_depth

Integer. The maximum depth of the trees for methods like xgboost. Default is 6.

interaction

Logical. If TRUE, interaction terms of the conditional variables are included in the model. Default is TRUE.

mode

Character. Specifies the mode of operation: "numeric_only" or "mixed". Default is "numeric_only".

metricfunc

Optional the user can pass a custom function for calculating a performance metric based on the model's predictions. Default is NULL.

mlfunc

Optional the user can pass a custom machine learning wrapper function to use instead of the predefined methods. Default is NULL.

tail

Character. Specifies whether to calculate left-tailed or right-tailed p-values, depending on the performance metric used. Only applicable if using metricfunc or mlfunc. Default is NA.

tune

Logical. If TRUE, the function will perform hyperparameter tuning for the specified machine learning method. Default is FALSE.

samples

Integer. Number of hyperparameter combinations used in tuning. Default is 35.

folds

Integer. The number of folds for cross-validation during the tuning process. Default is 5.

tune_length

Integer. The number of parameter combinations to try during the tuning process. Default is 10.

k

Integer. The number of nearest neighbors to use for KNN method. Default is 15.

center

Logical. If TRUE, the data will be centered before fitting the model

scale

Logical. If TRUE, the data will be scaled before fitting the model. Default is TRUE.

eps

Numeric. A small value to avoid division by zero in some calculations.

positive

Character. The name of the positive class (KNN) in the data, used for classification tasks. Default is NULL.

kernel

Character. The kernel type to use for KNN method. Default is "optimal".

distance

Numeric. Parameter of Minkowski distance for the "KNN" method. Default is 2.

seed

Integer. Set the seed for reproducing results. Default is NA.

random_grid

Logical. If TRUE, a random grid search is performed. If FALSE, a full grid search is performed. Default is TRUE.

nthread

Integer. The number of threads to use for parallel processing. Default is 1.

verbose

Logical. If TRUE, additional information is printed during the execution of the function. Default is FALSE.

progress

Logical. If TRUE, a progress bar is displayed during the permutation process. Default is TRUE.

...

Additional arguments to pass to the perm.test function.

See Also

perm.test, print.summary.CCI, plot.CCI, CCI.pretuner, QQplot

Examples

Run this code
set.seed(123)
data <- data.frame(x1 = stats::rnorm(100), x2 = stats::rnorm(100), y = stats::rnorm(100))
result <- CCI.test(y ~ x1 | x2, data = data, nperm = 25, interaction = FALSE)
summary(result)

Run the code above in your browser using DataLab