crlmmCopynumber(object, MIN.SAMPLES=10, SNRMin = 5, MIN.OBS = 1,
DF.PRIOR = 50, bias.adj = FALSE, prior.prob = rep(1/4, 4), seed = 1, verbose = TRUE, GT.CONF.THR = 0.80, MIN.NU = 2^3, MIN.PHI = 2^3, THR.NU.PHI = TRUE, type=c("SNP", "NP", "X.SNP", "X.NP"), fit.linearModel=TRUE)CNSet. For a SNP with with fewer than MIN.OBS of a genotype in a given
batch, the within-genotype median is imputed. The imputation is based
on a regression using SNPs for which all three biallelic genotypes are
observed. For example, assume at at a given SNP genotypes AA and AB
were observed and BB is an unobserved genotype. For SNPs in which all
3 genotypes were observed, we fit the model E(mean_BB) = beta0 +
beta1*mean_AA + beta2*mean_AB, obtaining estimates; of beta0, beta1,
and beta2. The imputed mean at the SNP with unobserved BB is then
beta0hat + beta1hat * mean_AA of beta2hat * mean_AB.
The 2 x 2 covariance matrix of the background and signal variances is estimated from the data at each locus. This matrix is then smoothed towards a common matrix estimated from all of the loci. DF.PRIOR controls the amount of smoothing towards the common matrix, with higher values corresponding to greater smoothing. Currently, DF.PRIOR is not estimated from the data. Future versions may estimate DF.PRIOR empirically.
bias.adj is currently ignored (as well as the prior.prob
argument). We plan to add this feature back to the crlmm package in
the near future. This feature, when TRUE, updated initial
estimates from the linear model after excluding samples with a low
posterior probability of normal copy number. Excluding samples that
have a low posterior probability can be helpful at loci in which a
substantial fraction of the samples have a copy number alteration.
For additional information, see Scharpf et al., 2010.
This argument is currently ignored. A numerical vector providing prior probabilities for copy number states corresponding to homozygous deletion, hemizygous deletion, normal copy number, and amplification, respectively.
Confidence threshold for genotype calls (0, 1). Calls with confidence scores below this theshold are not used to estimate the within-genotype medians. See Carvalho et al., 2007 for information regarding confidence scores of biallelic genotypes.
THR.NU.PHI is FALSE. THR.NU.PHI is FALSE.THR.NU.PHI is FALSE,
MIN.NU and MIN.PHI are ignored. When TRUE, background
(nu) and slope (phi) coefficients below MIN.NU and MIN.PHI are set
to MIN.NU and MIN.PHI, respectively.crlmmCopynumber function
depends on whether the data is stored in RAM or whether the data
is stored on disk using the R package ff for reading /
writing. If uncertain, the first line of the show method
defined for CNSet objects prints whether the
assayData elements are derived from the ff package
in the first line. Specifically,- if the elements of the batchStaticts slot in the
CNSet object have the class "ff_matrix" or "ffdf", then
the crlmmCopynumber function updates the data stored on
disk and returns the value TRUE.- if the elements of the batchStatistics slot in the
CNSet object have the class 'matrix', then the
crlmmCopynumber function returns an object of class
CNSet with the elements of batchStatistics
updated. We suggest a minimum of 10 samples per batch for using
crlmmCopynumber. 50 or more samples per batch is preferred
and will improve the estimates.
The functions crlmmCopynumberLD and
crlmmCopynumber2 have been deprecated.
The argument type can be used to specify a subset of
markers for which the copy number estimation algorithm is run.
One or more of the following possible entries are valid: 'SNP',
'NP', 'X.SNP', and 'X.NP'.
'SNP' referers to autosomal SNPs.
'NP' refers to autosomal nonpolymorphic markers.
'X.SNP' refers to SNPs on chromosome X.
'X.NP' refers to autosomes on chromosome X.
However, users must run 'SNP' prior to running 'NP' and 'X.NP',
or specify type = c('SNP', 'X.NP').
Carvalho B, Bengtsson H, Speed TP, Irizarry RA. Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics. 2007 Apr;8(2):485-99. Epub 2006 Dec 22. PMID: 17189563.
Carvalho BS, Louis TA, Irizarry RA. Quantifying uncertainty in genotype calls. Bioinformatics. 2010 Jan 15;26(2):242-9.
Scharpf RB, Ruczinski I, Carvalho B, Doan B, Chakravarti A, and Irizarry RA, Biostatistics. Biostatistics, Epub July 2010.