Construct a k-mer oligo profile of a nucleotide sequence and print such a profile or its reverse complement. There is also a plot function for producing plots of the profile or its reverse complement and for comparing primary and complementary strand profiles.
oligoProfile(x, k, content=c("dna", "rna"),
case=c("lower", "upper", "as is"), circular=TRUE, disambiguate=TRUE,
plot=TRUE, ...)
# S3 method for OligoProfile
plot(x, which=1L, units=c("percentage", "count", "proportion"),
main=NULL, xlab=NULL, ylab=NULL, ...)
# S3 method for OligoProfile
print(x, which=1L, units=c("percentage", "count", "proportion"),
digits=switch(units, percentage=3L, count=NULL, proportion=3L), ...)
A list with class “OligoProfile” containing the following components:
a name to identify the source of the profile.
the value of k used to derive the k-mer profile.
indicates if the profile pertains to a DNA or RNA sequence.
indicates how the case of letters was processed before producing the profile.
indicates whether or not the sequence was considered circular for the purpose of producing the profile.
indicates if the sequence was made unambiguous before producing the profile.
a vector containing the raw counts (frequencies) of all k-mers.
a character vector or an object that can be coersed to a character vector.
the k-mer profile to produce.
The content type (“dna
” or “rna
”) of the input
sequence. oligoProfile
can often detect this automatically based on
the presence/absence of t
's or u
's, but if neither is present,
the content
argument is consulted. The default value is
“dna
”.
determines how labels for the array should be generated: in lowercase, in uppercase or left as is, in which case labels such as “b” and “B” will be seen as distinct symbols and counted separately.
Determines if the vector should be treated as circular or not. The default is
TRUE
, meaning that the start and end of the sequence will be joined
together for the purpose of counting.
if set to the default of true
, makes the input sequence unambiguous
before generating the profile. Otherwise, ambiguous symbols are treated like
any other symbols and k-mer counts including them will be computed.
should a plot of the profile be produced? The default is TRUE
.
For print
, specifies whether to display the profile for the sequence used
to generate the OligoProfile object (1
) or the profile of its reverse
complement (2
).
For the plot
method, which
determines what should be plotted.
Values 1
and 2
cause the profile for the original sequence
(primary strand) or its reverse complement (complementary strand) to be plotted,
respectively. Specifying which=3
will plot a comparison of the two
profiles which can be used to assess compliance with Chargaff's second parity
rule.
the which
argument may also be specified when calling
oligoProfile
, in which case it will be passed on to the plot
method if the plot
argument is set to TRUE
.
The oligo profiles can be scaled according to three different units for
presentation on plots: “percentage
”, “count
” or
“proportion
”. The default is “percentage
”.
The title of the plot. See plot.default
. If not specified, an
appropriate title is automatically generated.
a label for the x-axis of the plot. See plot.default
. If not specified, an
appropriate label is automatically generated.
a label for the y-axis of the plot. See plot.default
. If not specified, an
appropriate label is automatically generated.
The number of significant digits to print. The default is 0L
when units
is set to “count
” and 3L
otherwise.
arguments to be passed from or to other functions
Andrew Hart and Servet Martínez
This function returns the oligo profile for a sequence in an OligoProfile
object, which is printed on screen if the plot
parameter is FALSE
.
An oligo profile is simply the counts of all k
-mers in a sequence for
some specified value of k
.
By default, oligoProfile
produces a plot of the oligo profile expressed
in terms of percentages. The plot
argument determines if the plot
should be generated or not and plotting parameters such as main
,
sub
, etc., may be passed as arguments to the function when plot
is
TRUE
.
The plot
method, either called directly or indirectly via the
oligoProfile
function, can produce either the oligo profile of x
(which = 1
), the oligo profile of its reverse complement (which =
2
), or an interstrand k-mer correlation plot comparing the k-oligo profile
ofx
with that of its reverse complement (which = 3)
. Such
Correlation plots effectively show the relationship between k-mers on the primary and complementary strands in a DNA duplex and can be used to assess compliance with CSPR. More precisely, one would conclude that a genomic sequence complies with CSPR if all the plotted points lie on a diagonal line running from the bottom-left corner to the top-right corner of the graph.
Albrecht-Buehler, G. (2006) Asymptotically increasing compliance of genomes with Chargaff's second parity rules through inversions and inverted transpositions. PNAS 103(47), 17828--17833.
pair.counts
, triple.counts
,
quadruple.counts
, cylinder.counts
,
array2vector
, table2vector
, disambiguate
data(nanoarchaeum)
#Get the 3-oligo profile of Nanoarchaeum without plotting it
nano.prof <- oligoProfile(nanoarchaeum, 3, plot=FALSE)
nano.prof #print oligo profile as percentages
print(nano.prof, units="count") #print oligo profile as counts
plot(nano.prof) #oligo profile plotted as percentages
plot(nano.prof, units="count") #plot it as counts
#plot the 2-oligo profile of Nanoarchaeum as proportions
oligoProfile(nanoarchaeum, k=3, units="proportion")
Run the code above in your browser using DataLab