ccdf_testing: Main function to perform complex hypothesis testing using (un)conditional independence test

Description

Main function to perform complex hypothesis testing using (un)conditional independence test

Usage

ccdf_testing(
  exprmat = NULL,
  variable2test = NULL,
  covariate = NULL,
  distance = c("L2", "L1", "L_sup"),
  test = c("asymptotic", "permutations", "dist_permutations"),
  method = c("linear regression", "logistic regression", "RF"),
  fast = TRUE,
  n_perm = 100,
  n_perm_adaptive = c(100, 150, 250, 500),
  thresholds = c(0.1, 0.05, 0.01),
  parallel = TRUE,
  n_cpus = NULL,
  adaptive = FALSE,
  space_y = FALSE,
  number_y = ncol(exprmat)
)

Arguments

exprmat

a data frame of size G x n containing the preprocessed expressions from n samples (or cells) for G genes. Default is NULL.

variable2test

a data frame of numeric or factor vector(s) of size n containing the variable(s) to be tested (the condition(s))

covariate

a data frame of numeric or factor vector(s) of size n containing the covariate(s)

distance

a character string indicating which distance to use to compute the test, either 'L2', 'L1' or 'L_sup', when method is 'dist_permutations', Default is 'L2'.

test

a character string indicating which method to use to compute the test, either 'asymptotic', 'permutations' or 'dist_permutations'. 'dist_permutations' allows to compute the distance between the CDF and the CCDF or two CCDFs. Default is 'asymptotic'.

method

a character string indicating which method to use to compute the CCDF, either 'linear regression', 'logistic regression' and 'permutations' or 'RF' for Random Forests. Default is 'linear regression' since it is the method used in the test.

fast

a logical flag indicating whether the fast implementation of logistic regression should be used. Only if 'dist_permutations' is specified. Default is TRUE.

n_perm

the number of permutations. Default is 100.

n_perm_adaptive

a vector of the increasing numbers of adaptive permutations when adaptive is TRUE. length(n_perm_adaptive) should be equal to length(thresholds)+1. Default is c(0.1,0.05,0.01).

thresholds

a vector of the decreasing thresholds to compute adaptive permutations when adaptive is TRUE. length(thresholds) should be equal to length(n_perm_adaptive)-1. Default is c(100,150,250,500).

parallel

a logical flag indicating whether parallel computation should be enabled. Default is TRUE.

n_cpus

an integer indicating the number of cores to be used when parallel is TRUE. Default is parallel::detectCores() - 1.

adaptive

a logical flag indicating whether adaptive permutations should be performed. Default is FALSE.

space_y

a logical flag indicating whether the y thresholds are spaced. When space_y is TRUE, a regular sequence between the minimum and the maximum of the observations is used. Default is FALSE.

number_y

an integer value indicating the number of y thresholds (and therefore the number of regressions) to perform the test. Default is ncol(exprmat).

Value

A list with the following elements:

which_test: a character string carrying forward the value of the 'which_test' argument indicating which test was performed (either 'asymptotic','permutations','dist_permutations').
n_perm: an integer carrying forward the value of the 'n_perm' argument or 'n_perm_adaptive' indicating the number of permutations performed (NA if asymptotic test was performed).
pval: computed p-values. A data frame with one raw for each gene, and with 2 columns: the first one 'raw_pval' contains the raw p-values, the second one 'adj_pval' contains the FDR adjusted p-values using Benjamini-Hochberg correction.

References

Gauthier M, Agniel D, Thi<U+00E9>baut R & Hejblum BP (2019). Distribution-free complex hypothesis testing for single-cell RNA-seq differential expression analysis, *bioRxiv* 445165. [DOI: 10.1101/2021.05.21.445165](https://doi.org/10.1101/2021.05.21.445165).

Examples

Run this code

# NOT RUN {
X <- as.factor(rbinom(n=100, size = 1, prob = 0.5))
Y <- t(replicate(10, ((X==1)*rnorm(n = 50,0,1)) + ((X==0)*rnorm(n = 50,0.5,1))))
res_asymp <- ccdf_testing(exprmat=data.frame(Y=Y), 
variable2test=data.frame(X=X), test="asymptotic",
n_cpus=1)$pvals # asymptotic test
# }

Run the code above in your browser using DataLab