Main function to perform complex hypothesis testing using (un)conditional independence test
ccdf_testing(
exprmat = NULL,
variable2test = NULL,
covariate = NULL,
distance = c("L2", "L1", "L_sup"),
test = c("asymptotic", "permutations", "dist_permutations"),
method = c("linear regression", "logistic regression", "RF"),
fast = TRUE,
n_perm = 100,
n_perm_adaptive = c(100, 150, 250, 500),
thresholds = c(0.1, 0.05, 0.01),
parallel = TRUE,
n_cpus = NULL,
adaptive = FALSE,
space_y = FALSE,
number_y = ncol(exprmat)
)a data frame of size G x n containing the
preprocessed expressions from n samples (or cells) for G
genes. Default is NULL.
a data frame of numeric or factor vector(s)
of size n containing the variable(s) to be tested (the condition(s))
a data frame of numeric or factor vector(s)
of size n containing the covariate(s)
a character string indicating which distance to use to
compute the test, either 'L2', 'L1' or
'L_sup', when method is 'dist_permutations',
Default is 'L2'.
a character string indicating which method to use to
compute the test, either 'asymptotic', 'permutations' or
'dist_permutations'. 'dist_permutations' allows to compute
the distance between the CDF and the CCDF or two CCDFs.
Default is 'asymptotic'.
a character string indicating which method to use to
compute the CCDF, either 'linear regression', 'logistic regression'
and 'permutations' or 'RF' for Random Forests.
Default is 'linear regression' since it is the method used in the test.
a logical flag indicating whether the fast implementation of
logistic regression should be used. Only if 'dist_permutations' is specified.
Default is TRUE.
the number of permutations. Default is 100.
a vector of the increasing numbers of
adaptive permutations when adaptive is TRUE.
length(n_perm_adaptive) should be equal to length(thresholds)+1.
Default is c(0.1,0.05,0.01).
a vector of the decreasing thresholds to compute
adaptive permutations when adaptive is TRUE.
length(thresholds) should be equal to length(n_perm_adaptive)-1.
Default is c(100,150,250,500).
a logical flag indicating whether parallel computation
should be enabled. Default is TRUE.
an integer indicating the number of cores to be used when
parallel is TRUE.
Default is parallel::detectCores() - 1.
a logical flag indicating whether adaptive permutations
should be performed. Default is FALSE.
a logical flag indicating whether the y thresholds are spaced.
When space_y is TRUE, a regular sequence between the minimum and
the maximum of the observations is used. Default is FALSE.
an integer value indicating the number of y thresholds (and therefore
the number of regressions) to perform the test. Default is ncol(exprmat).
A list with the following elements:
which_test: a character string carrying forward the value of
the 'which_test' argument indicating which test was performed (either
'asymptotic','permutations','dist_permutations').
n_perm: an integer carrying forward the value of the
'n_perm' argument or 'n_perm_adaptive' indicating the number of permutations performed
(NA if asymptotic test was performed).
pval: computed p-values. A data frame with one raw for
each gene, and with 2 columns: the first one 'raw_pval' contains
the raw p-values, the second one 'adj_pval' contains the FDR adjusted p-values
using Benjamini-Hochberg correction.
Gauthier M, Agniel D, Thi<U+00E9>baut R & Hejblum BP (2019). Distribution-free complex hypothesis testing for single-cell RNA-seq differential expression analysis, *bioRxiv* 445165. [DOI: 10.1101/2021.05.21.445165](https://doi.org/10.1101/2021.05.21.445165).
# NOT RUN {
X <- as.factor(rbinom(n=100, size = 1, prob = 0.5))
Y <- t(replicate(10, ((X==1)*rnorm(n = 50,0,1)) + ((X==0)*rnorm(n = 50,0.5,1))))
res_asymp <- ccdf_testing(exprmat=data.frame(Y=Y),
variable2test=data.frame(X=X), test="asymptotic",
n_cpus=1)$pvals # asymptotic test
# }
Run the code above in your browser using DataLab