Usage
doppelgangR(esets, separator = ":", corFinder.args = list(separator = separator,
use.ComBat = TRUE, method = "pearson"), phenoFinder.args = list(separator = separator,
vectorDistFun = vectorWeightedDist), outlierFinder.expr.args = list(bonf.prob = 0.5,
transFun = atanh, tail = "upper"), outlierFinder.pheno.args = list(normal.upper.thresh = 0.99,
bonf.prob = NULL, tail = "upper"), smokingGunFinder.args = list(transFun = I),
impute.knn.args = list(k = 10, rowmax = 0.5, colmax = 0.8,
maxp = 1500, rng.seed = 362436069), manual.smokingguns = NULL,
automatic.smokingguns = FALSE, within.datasets.only = FALSE,
intermediate.pruning = FALSE, cache.dir = "cache", BPPARAM = bpparam(),
verbose = TRUE)
Arguments
esets
a list of ExpressionSets, containing the numeric and phenotypic data to be analyzed.
separator
a delimitor to use between dataset names and sample names
corFinder.args
a list of arguments to be passed to the corFinder function.
phenoFinder.args
a list of arguments to be passed to the phenoFinder function. If
NULL, samples with similar phenotypes will not be searched for.
outlierFinder.expr.args
a list of arguments to be passed to outlierFinder when called for expression data
outlierFinder.pheno.args
a list of arguments to be passed to outlierFinder when called for phenotype data
smokingGunFinder.args
a list of arguments to be passed to smokingGunFinder
impute.knn.args
a list of arguments to be passed to impute::impute.knn. Set to
NULL to do no knn imputation.
manual.smokingguns
a character vector of phenoData columns that, if identical, will
be considered evidence of duplication
automatic.smokingguns
automatically look for "smoking guns." If TRUE, look for
phenotype variables that are unique to each patient in dataset 1,
also unique to each patient in dataset 2, but contain exact
matches between datasets 1 and 2.
within.datasets.only
If TRUE, only search within each dataset for doppelgangers.
intermediate.pruning
The default setting FALSE will result in output with no missing
values, but uses extra memory because all results from the
expression, phenotype, and smoking gun doppelganger searches must
be saved until the end. Setting this to TRUE will save memory for
very large searches, but distance metrics will only be available
if that value was identified as a doppelganger (for example,
phenotype doppelgangers will have missing values for the
expression and smoking gun similarity).
cache.dir
The name of a directory in which to cache or look up results to save
re-calculating correlations. Set to NULL for no caching.
BPPARAM
Argument for BiocParallel::bplapply(), by default will use all
cores of a multi-core machine
verbose
Print progress information