Usage
runAbsoluteCN(gatk.normal.file = NULL, gatk.tumor.file, log.ratio = NULL, seg.file = NULL, seg.file.sdev = 0.4, vcf.file = NULL, genome = "hg19", sex = c("?", "F", "M"), fun.filterVcf = filterVcfMuTect, args.filterVcf = list(), fun.setPriorVcf = setPriorVcf, args.setPriorVcf = list(), fun.segmentation = segmentationCBS, args.segmentation = list(), fun.focal = findFocal, args.focal = list(), sampleid = NULL, min.ploidy = 1, max.ploidy = 6, test.num.copy = 0:7, test.purity = seq(0.05, 0.95, by = 0.01), prior.purity = rep(1, length(test.purity))/length(test.purity), max.candidate.solutions = 15, candidates = NULL, coverage.cutoff = 15, max.non.clonal = 0.2, max.homozygous.loss = 0.1, iterations = 30, log.ratio.calibration = 0.25, gc.gene.file = NULL, filter.lowhigh.gc.exons = 0.001, filter.targeted.base = 4, max.logr.sdev = 0.75, max.segments = 200, plot.cnv = TRUE, verbose = TRUE, post.optimize = FALSE, ...)
Arguments
gatk.normal.file
GATK coverage file of normal control (optional if
log.ratio is provided - then it will be only used to filter low coverage
exons). Should be already GC-normalized. Needs to be either a file name
or data read with the readCoverageGatk function.
gatk.tumor.file
GATK coverage file of tumor. Should be already
GC-normalized. Needs to be either a file name or data read with the
readCoverageGatk function.
log.ratio
Copy number log-ratios for all exons in the coverage files.
If NULL, calculated based on coverage files.
seg.file
Segmented data. Optional, to support matched SNP6 data.
If null, use coverage files or log.ratio to segment the data.
seg.file.sdev
If seg.file provided, the log-ratio standard deviation,
used to model likelihood of sub-clonal copy number events.
vcf.file
VCF file, tested with MuTect output files. Optional, but
typically needed to select between local optima of similar likelihood. Can
also be a CollapsedVCF, read with the readVcf function. Requires a DB info
flag for dbSNP membership. The default fun.setPriorVcf function will also
look for a Cosmic.CNT slot, containing the hits in the COSMIC database.
Again, do not expect very useful results without a VCF file.
genome
Genome version, required for the readVcf function.
sex
Sex of sample. If ?, detect.
fun.filterVcf
Function for filtering variants. Expected output is a
list with elements vcf (CollapsedVCF), flag (TRUE/FALSE) and flag_comment
(string). The flags will be added to the output data and can be used to
warn users, for example when samples look too noisy. Default filter will
remove variants flagged by MuTect, but will keep germline variants. If
ran in matched normal mode, it will by default use somatic status of
variants and filter non-somatic calls with allelic fraction significantly
different from 0.5 in normal.
args.filterVcf
Arguments for variant filtering function. Arguments
vcf, tumor.id.in.vcf, coverage.cutoff and verbose are required in the
filter function and are automatically set (do NOT set them here again).
fun.setPriorVcf
Function to set prior for somatic status for each
variant in the VCF.
args.setPriorVcf
Arguments for somatic prior function.
fun.segmentation
Function for segmenting the copy number log-ratios.
Expected return value is a list with elements seg (the segmentation) and
size (the size in bp for all segments).
args.segmentation
Arguments for segmentation function. Arguments
normal, tumor, log.ratio, plot.cnv, coverage.cutoff, sampleid, vcf,
tumor.id.in.vcf, verbose are required in the segmentation function and
automatically set (do NOT set them here again).
fun.focal
Function for identifying focal amplifications.
args.focal
Arguments for focal amplification function.
sampleid
Sample id, provided in output files etc.
min.ploidy
Minimum ploidy to be considered.
max.ploidy
Maximum ploidy to be considered.
test.num.copy
Copy numbers tested in the grid search. Note that focal
amplifications can have much higher copy numbers, but they will be labeled
as subclonal (because they do not fit the integer copy numbers).
test.purity
Considered tumor purity values.
prior.purity
Priors for purity if they are available. Only change
when you know what you are doing.
max.candidate.solutions
Number of local optima considered in optimization
and variant fitting steps. If there are too many local optima, it will use
specified number of top candidate solutions, but will also include all
optima close to diploid, because silent genomes have often lots of local
optima.
candidates
Candidates to optimize from a previous run
(return.object$candidates).
If NULL, do 2D grid search and find local optima.
coverage.cutoff
Minimum exon coverage in both normal and tumor. Exons
with lower coverage are ingored. The cutoff choice depends on the expected
purity and overall coverage. High purity samples might need a lower cutoff
to call homozygous deletions. If an exon.weigh.file (below) is NOT
specified, it is recommended to set a higher cutoff (e.g. 20) to remove
noise from unreliable exon measurements.
max.non.clonal
Maximum genomic fraction assigned to a subclonal copy
number state.
max.homozygous.loss
Maximum genomic fraction assigned to homozygous loss.
This is set to a fairly high default value to not exclude correct
solutions, especially in noisy segmentations.
iterations
Maximum number of iterations in the Simulated Annealing copy
number fit optimization.
log.ratio.calibration
re-calibrate log-ratios in the window
sd(log.ratio)*log.ratio.calibration.
gc.gene.file
A mapping file that assigns GC content and gene symbols
to each exon in the coverage files. Used for generating gene level calls.
First column in format CHR:START-END. Second column GC content (0 to 1).
Third column gene symbol.
filter.lowhigh.gc.exons
Quantile q (defines lower q and upper 1-q)
for removing exons with outlier GC profile. Assuming that GC correction
might not have been worked on those. Requires gc.gene.file.
filter.targeted.base
Exclude exons with targeted base (size) smaller
than this cutoff. This is useful when the same interval file was used to
calculate GC content. For such small exons, the GC content is likely
very different from the true GC content of the probes.
max.logr.sdev
Flag noisy samples with segment log-ratio standard deviation
larger than this. Assay specific and needs to be calibrated.
max.segments
Flag noisy samples with a large number of segments. Assay
specific and needs to be calibrated.
plot.cnv
Generate segmentation plots.
post.optimize
Optimize purity using final SCNA-fit and SNVs. This
might take a long time when lots of SNVs need to be fitted, but will
typically result in a slightly more accurate purity, especially for rather
silent genomes or very low purities. Otherwise, it will just use the
purity determined via the SCNA-fit.
...
Additional parameters passed to the segmentation function.