anomSegmentBAF for each sample and chromosome, breaks the chromosome up into
segments marked by change points of a metric based on B Allele Frequency (BAF) values.
anomFilterBAF selects segments which are likely to be anomalous.
anomDetectBAF is a wrapper to run anomSegmentBAF and
anomFilterBAF in one step.
anomSegmentBAF(intenData, genoData, scan.ids, chrom.ids, snp.ids, smooth = 50, min.width = 5, nperm = 10000, alpha = 0.001, verbose = TRUE)
anomFilterBAF(intenData, genoData, segments, snp.ids, centromere, low.qual.ids = NULL, num.mark.thresh = 15, long.num.mark.thresh = 200, sd.reg = 2, sd.long = 1, low.frac.used = 0.1, run.size = 10, inter.size = 2, low.frac.used.num.mark = 30, very.low.frac.used = 0.01, low.qual.frac.num.mark = 150, lrr.cut = -2, ct.thresh = 10, frac.thresh = 0.1, verbose=TRUE, small.thresh=2.5, dev.sim.thresh=0.1, centSpan.fac=1.25, centSpan.nmark=50)
anomDetectBAF(intenData, genoData, scan.ids, chrom.ids, snp.ids, centromere, low.qual.ids = NULL, ...)IntensityData object containing the B Allele
Frequency. The order of the rows of intenData and the snp annotation
are expected to be by chromosome and then by position within chromosome.
The scan annotation should contain sex, coded as "M" for male and
"F" for female.
GenotypeData object. The order of the rows of genoData
and the snp annotation are expected to be by chromosome and then
by position within chromosome.
intenData. Recommended to include
all autosomes, and optionally X (males will be ignored) and the
pseudoautosomal (XY) region.
HLA and pseudoautosomal.
If there are SNPs annotated in the centromere gap, exclude these as
well (see centromeres).
smooth.CNA
in the DNAcopy package.
anomSegmentBAF. Names must
include "scanID", "chromosome", "num.mark", "left.index", "right.index", "seg.mean".
Here "left.index" and "right.index" are row indices of intenData. Left and right
refer to start and end of anomaly,respectively, in position order.
centromeres.
sdByScanChromWindow and medianSdOverAutosomes.
sd.reg but applied to "long" segments
low.frac.used segments (which are not
declared homozygous deletions
low.qual.ids)
for segments that are also below low.frac.used threshold
lrr.cut to adjust homozygous deletion endpoints
lrr.cut needed in order to adjust
lrr.cut and ct.thresh
thresholds met and (# LRR values below lrr.cut)/(# eligible SNPs in segment) > frac.thresh
anomFilterBAF
anomSegmentBAF returns a data.frame with the following elements: Left and right
refer to start and end of anomaly, respectively, in position order.anomFilterBAF and anomDetectBAF return a list with the
following elements:
anomSegmentBAF as well as:
left.base: base position of left endpoint of segment
right.base: base position of right endpoint of segment
sex: sex of scan.id coded as "M" or "F"
sd.fac: measure of deviation from baseline equal to
abs(mean of segment - baseline mean)/(baseline standard deviation);
used in determining anomalous segments
raw as well as:
merge: TRUE if segment was a result of merging. Consecutive segments
from output of anomSegmentBAF that meet certain criteria are merged.
homodel.adjust: TRUE if original segment was adjusted to
narrow in on a homozygous deletion
frac.used: fraction of (eligible) heterozygous or missing SNP markers compared with total number of
eligible SNP markers in segment
scanID: integer id of scan
base.mean: mean of non-anomalous baseline. This is the mean of the
BAF metric for heterozygous and missing SNPs over all unsegmented autosomes
that were considered.
base.sd: standard deviation of non-anomalous baseline
chr.ct: number of unsegmented chromosomes used in determining
the non-anomalous baseline
scanID: integer id of scan
chromosome: chromosome as integer
num.segs: number of segments produced by anomSegmentBAF
anomSegmentBAF uses the function segment from
the DNAcopy package to perform circular binary segmentation
on a metric based on BAF values. The metric for a given sample/chromosome
is sqrt(min(BAF,1-BAF,abs(BAF-median(BAF))) where the median is
across BAF values on the chromosome. Only BAF values for heterozygous or
missing SNPs are used.anomFilterBAF determines anomalous segments based on a combination
of thresholds for number of SNP markers in the segment and on deviation from
a "normal" baseline. (See num.mark.thresh,long.num.mark.thresh,
sd.reg, and sd.long.) The "normal" baseline metric mean and standard deviation
are found across all autosomes not segmented by anomSegmentBAF. This is why
it is recommended to include all autosomes for the argument chrom.ids to
ensure a more accurate baseline.
Some initial filtering is done,
including possible merging of consecutive segments meeting sd.reg
threshold along with other criteria (such as not spanning the centromere)
and adjustment for accurate
break points for possible homozygous deletions (see lrr.cut,
ct.thresh, frac.thresh, run.size, and inter.size).
Male samples for X chromosome are not processed.
More stringent criteria are applied to some segments
(see low.frac.used,low.frac.used.num.mark,
very.low.frac.used, low.qual.ids, and
low.qual.frac.num.mark).
anomDetectBAF runs anomSegmentBAF with default values and
then runs anomFilterBAF. Additional parameters for
anomFilterBAF may be passed as arguments.
See references in segment in the package DNAcopy.
The BAF metric used is modified from Itsara,A., et.al (2009) Population
Analysis of Large Copy Number Variants and Hotspots of Human Genetic Disease.
American Journal of Human Genetics, 84, 148--161.
segment and smooth.CNA in the package DNAcopy,
also findBAFvariance, anomDetectLOH
library(GWASdata)
data(illuminaScanADF, illuminaSnpADF)
blfile <- system.file("extdata", "illumina_bl.gds", package="GWASdata")
bl <- GdsIntensityReader(blfile)
blData <- IntensityData(bl, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)
genofile <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
geno <- GdsGenotypeReader(genofile)
genoData <- GenotypeData(geno, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)
# segment BAF
scan.ids <- illuminaScanADF$scanID[1:2]
chrom.ids <- unique(illuminaSnpADF$chromosome)
snp.ids <- illuminaSnpADF$snpID[illuminaSnpADF$missing.n1 < 1]
seg <- anomSegmentBAF(blData, genoData, scan.ids=scan.ids,
chrom.ids=chrom.ids, snp.ids=snp.ids)
# filter segments to detect anomalies
data(centromeres.hg18)
filt <- anomFilterBAF(blData, genoData, segments=seg, snp.ids=snp.ids,
centromere=centromeres.hg18)
# alternatively, run both steps at once
anom <- anomDetectBAF(blData, genoData, scan.ids=scan.ids, chrom.ids=chrom.ids,
snp.ids=snp.ids, centromere=centromeres.hg18)
close(blData)
close(genoData)
Run the code above in your browser using DataLab