Usage
CNVtest.binary(signal, batch, sample = NULL, disease.status = NULL, ncomp,
n.H0 = 5, n.H1 = 0, output = 'compact',
model.mean = "~ strata(batch, cn)",
model.var = "~ strata(batch, cn)", model.disease ="~ cn", association.test.strata = NULL,
beta.estimated = NULL,
start.mean = NULL, start.var = NULL,
control = list(tol = 1e-05, max.iter = 3000,min.freq = 4))
Arguments
signal
The vector of intensity values, meant to be a proxy for the
number of copies.
batch
Factor, that describes how the data points should be
separated in batches, corresponding to different tehnologies to
measure the number of DNA copies, or maybe different cohorts in a
case control framework.
sample
Optional (but recommended). A character vector
containing a name for each data point, typically the name of the
individuals.
disease.status
In the case control situation a vector of 0 and
1 indicating which individuals are controls or cases.
ncomp
Number of components one wants to fit to the data.
n.H0
Number of times the EM should be used to maximize the
likelihood under the null hypothesis of no association, each time
with a different random starting point. The run that maximizes the
likelihood is stored.
n.H1
Number of times the EM should be used to maximize the
likelihood under the alternate hypothesis of association present, each time
with a different random starting point. The run that maximizes the
likelihood is stored.
output
The default value, ``compact'', returns a data frame with one line per sample.
Any other setting witll return a much bigger data frame with one line per individual and copy number.
This long format is the one used by the underlying fitting algorithm and is only useful if one attempts
to use CNVtools in a non standard manner.
model.mean
Formula that describes the linear model for the
location of the mean signal intensity. The default is ``~ strata(cn,
batch)'', which means that the mean intensity can take any value
for any combination of the variables ``cn'' (for copy number) and
``batch''. More traditional model description such as ' ~
as.factor(cn)' for example are also possible, but are likely to be
slower to fit and less numerically stable than the ``strata''
notation, which should be preferred.
model.var
A formula as above, but to model the
variances. Whenever possible and to maximise speed and stability the
model should be specified using the strata command, for example
``strata(batch, cn)'' (the default), meaning that variances are free
to take any value for each combination of the variables ``batch''
and ``copy number''.
Alternatives such as `` ~ cn'', i.e. variance proportional to the
number of copies are allowed but slower to fit, and less stable
numerically.
model.disease
A formula that links the number of copies with
the case/control status. The default is a logit linear trend model
``~ cn''. Note that this formula will only matter under the
alternate hypothesis and has no effect under the null (model
descriptions using the ``strata'' command are not allowed for this
model).
association.test.strata
Optional factor providing the strata when
using a stratified test of association (typically, but not always,
these are geographic regions of origins of the samples).
beta.estimated
Optional. It is used if one wants to fit the
model for a particular value of the log odds parameter beta
(essentially if one is interested in the profile likelihood).
In this case the disease model should be set to ' ~ 1' and the model
to 'H1'. It will then provide the best model assuming the
value of beta (the log odds ratio parameter) provided by the user.
start.mean
Optional. A set of starting values for the
means. Must be numeric and the size must match ncomp. This argument
can also be a matrix if one wants to specify multiple starting points. When
passing a matrix as argument, the number of columns should equal the
number of components, and the number of rows must be greater than
max(n.H0, n.H1). When in a row some numbers are missing, CNVtools will
pick the starting points randomly (the default).
start.var
Optional. A set of starting values for the
variances. Must be numeric and the size must match ncomp. Can also
be a matrix (see start.mean for details).
control
A list of parameters that control the behavior of the fitting.
min.freq is the minimum number of data points in a copy number class before
the algorithm sets the frequency of this class to zero.In the presence of a
very rare genotype group it might be useful to lower this threshold.
Note, however, that estimating the variance if there are very few individuals
in a class may not be possible, so setting options such as constant variances
(i.e. model.var = ' ~1') might be sensible.