MultiSourceCopyNumberNormalization: The MultiSourceCopyNumberNormalization class

Description

Package: aroma.cn
Class MultiSourceCopyNumberNormalization

Object
~~|
~~+--ParametersInterface
~~~~~~~|
~~~~~~~+--MultiSourceCopyNumberNormalization

Directly known subclasses:

public static class MultiSourceCopyNumberNormalization
extends ParametersInterface

The multi-source copy-number normalization (MSCN) method [1] is a normalization method that normalizes copy-number estimates measured by multiple sites and/or platforms for common samples. It normalizes the estimates toward a common scale such that for any copy-number level the mean level of the normalized data are the same.

Usage

MultiSourceCopyNumberNormalization(dsList=NULL, fitUgp=NULL, subsetToFit=NULL,
  targetDimension=1, align=c("byChromosome", "none"), tags="*", ...)

Arguments

dsList: A list of K AromaUnitTotalCnBinarySet:s.
fitUgp: An AromaUgpFile that specifies the common set of loci used to normalize the data sets at.
subsetToFit: The subset of loci (as mapped by the fitUgp object) to be used to fit the normalization functions. If NULL, loci on chromosomes 1-22 are used, but not on ChrX and ChrY.
targetDimension: A numeric index specifying the data set in dsList to which each platform in standardize towards. If NULL, the arbitrary scale along the fitted principal curve is used. This always starts at zero and increases.
align: A character specifying type of alignment applied, if any. If "none", no alignment is done. If "byChromosome", the signals are shifted chromosome by chromosome such the corresponding smoothed signals have the same median signal across sources. For more details, see below.
tags: (Optional) Sets the tags for the output data sets.
...: Not used.

Fields and Methods

Methods:

	`getAllNames`	-
	`getAsteriskTags`	-
	`getInputDataSets`	-
	`getOutputDataSets`	-
	`getTags`	-
	`nbrOfDataSets`	-
	`process`	-

Methods inherited from ParametersInterface:
getParameterSets, getParameters, getParametersAsString

Methods inherited from Object:
$, $<-, [[, [[<-, as.character, attach, attachLocally, clearCache, clearLookupCache, clone, detach, equals, extend, finalize, getEnvironment, getFieldModifier, getFieldModifiers, getFields, getInstantiationTime, getStaticInstance, hasField, hashCode, ll, load, objectSize, print, save, asThis

Different preprocessing methods normalize ChrX \& ChrY differently

Some preprocessing methods estimate copy numbers on sex chromosomes differently from the autosomal chromosomes. The way this is done may vary from method to method and we cannot assume anything about what approach is. This is the main reason why the estimation of the normalization function is by default based on signals from autosomal chromosomes only; this protects the estimate of the function from being biased by specially estimated sex-chromosome signals. Note that the normalization function is still applied to all chromosomes.

This means that if the transformation applied by a particular preprocessing method is not the same for the sex chromosomes as the autosomal chromosomes, the normalization applied on the sex chromosomes is not optimal one. This is why multi-source normalization sometimes fails to bring sex-chromosome signals to the same scale across sources. Unfortunately, there is no automatic way to handle this. The only way would be to fit a specific normalization function to each of the sex chromosomes, but that would require that there exist copy-number abberations on those chromosomes, which could be a too strong assumption.

A more conservative approach is to normalize the signals such that afterward the median of the smoothed copy-number levels are the same across sources for any particular chromosome. This is done by setting argument align="byChromosome".

Author

Henrik Bengtsson

Details

The multi-source normalization method is by nature a single-sample method, that is, it normalizes arrays for one sample at the time and independently of all other samples/arrays.

However, the current implementation is such that it first generates smoothed data for all samples/arrays. Then, it normalizes the sample one by one.

References

[1] H. Bengtsson, A. Ray, P. Spellman & T.P. Speed, A single-sample method for normalizing and combining full-resolution copy numbers from multiple platforms, labs and analysis methods, Bioinformatics 2009.