GRanges
/ GRangesList
object, or BAM file paths, of reads
for each experimental condition, or a matrix
or an AffynetrixCelSet
,
or a numeric matrix of array data, where the rows are probes and the columns are
the different samples,and an anntotation of features of interest, scores at
regularly spaced positions around the features is calculated. In the case of
sequencing data, it is the smoothed coverage of reads divided by the library size.
In the case of array data, it is array intensity.featureScores(x, anno, ...)
The ANY,GRanges method:
featureScores(x, anno, up = NULL, down = NULL, ...)
x
is a vector of paths or GRangesList
object,
then names(x)
should contain the types of the experiments.
If anno
is a data.frame
, it must contan the columns chr
,
start
, and end
. Optional columns are strand
and name
.
If anno
is a GRanges
object, then the name can be present as a column
called name
in the element metadata of the GRanges object. If names
are given, then the coverage matrices will use the names as their row names.
An approximation to running mean smoothing of the coverage is used. Reads are
extended to the smoothing width, rather than to their fragment size, and
coverage is used directly. This method is faster than a running mean of the
calculated coverage, and qualtatively almost identical.
If providing a matrix of array intensity values, the column names of this
matrix are used as the names of the samples.
The annotation can be stranded or not. if the annotation is stranded, then
the reference point is the start coordinate for features on the + strand,
and the end coordinate for features on the - strand. If the annotation is
unstranded (e.g. annotation of CpG islands), then the midpoint of the feature
is used for the reference point.
The up
and down
values give how far up and down from the
reference point to find scores. The semantics of them depend
on if the annotation is stranded or not. If the annotation is stranded, then
they give how far upstream and downstream will be sampled. If the annotation is
unstranded, then up
gives how far towards the start of a chromosome to go,
and down
gives how far towards the end of a chromosome to go.
If sequencing data is being analysed, and dist
is "percent"
,
then they give how many percent of each feature's width away from the reference
point the sampling boundaries are. If dist
is "base"
, then the
boundaries of the sampling region are a fixed width for every feature, and
the units of up
and down
are bases. up
and down
must be identical if the features are unstranded. The units of freq
are
percent for dist
being "percent"
, and bases for dist
being
"base"
.
In the case of array data, the sequence of positions described by up
,
down
, and freq
actually describe the boundaries of windows, and
the probe that is closest to the midpoint of each window is chosen as the
representative score of that window. On the other hand, when analysing sequencing
data, the sequence of positions refer to the positions that coverage is taken for.
Providing a mappability object for sequencing data is recommended. Otherwise, it is
not possible to know if a score of 0 is because the window around the sampling position
is unmappable, or if there were really no reads mapping there in the experiment.
Coverage is normalised by dividing the raw coverage by the total number of
reads in a sample. The coverage at a sampling position is multiplied by 1 / mappability.
Any positions that have mappabilty below the mappability cutoff will have their
score set to NA.ScoresList
object, that holds a list of score
matrices, one for each experiment type, and the parameters that were used
to create the score matrices.mergeReplicates
for merging sequencing data replicates of an
experiment type.data(chr21genes)
data(samplesList) # Loads 'samples.list.subset'.
fs <- featureScores(samples.list.subset[1:2], chr21genes, up = 2000, down = 1000,
freq = 500, s.width = 500)
Run the code above in your browser using DataLab