Objects from the Class
Objects can be created by calls of the form descriptors(seqs, response=numeric(0), base.matrix=NA, do.var=TRUE, alags=c(1,2,3), do.mean=TRUE, do.counts=TRUE, do.position=TRUE, alphabet=seqs@alphabet, include.statistics=TRUE, accuracy=0.01)Details
The descriptor calculation methods used here are not as sophisticated
as those provided in some of the more complete QSAR packages. Instead,
it relies on making various permutations of descriptors calculated on
single amino acids. There are two reasons for this. First, it is easy
to calculate descriptors quickly, without relying on another
program. Second, it is easier to treat calculating the distribution of
the descriptors of the sequence space. The ability to calculate the
descriptors across the sequence space also depends on the number of
descriptors and the chain length of the sequence. The advantage of
knowing descriptors on the whole sequence space is that it is easy to
determine if a descriptor on the sequences is significant. For
example, if the number of hydrogen bond donors is three standard
deviations above the mean number of hydrogen bond donors over all
sequence space, then that is a significant descriptor. This is
expressed as a p-value, which is calculated from a
wilcox.test. That is a non-parametric version of the
Student's t-test. The calculations are based on the given base.matrix
parameter. Given that matrix, which contains the descriptors
calculated on all the individual amino acids, it is possible to
calculate many sequence level descriptors. If the means are being
calculated (do.mean=true), then the mean of the descriptors for
each sequence is calculated. This doubles the number of
descriptors. The same is true of the do.var, which uses
variance along the sequence. The autocorrelation function can also be
calculated along the chain, again increasing the number of resulting
descriptors. This may be interesting for describing alternating
patterns. The position specific descriptors are simply the individual
descriptors at a certain position. For example, number of hydrogen
bond donors at position 2.
One often is more interested in understanding what is common amongst
the active sequences. This may be done by comparing a descriptor on the
active sequences to the inactive sequences. Since inactive sequences are
rarely collected in peptide libraries, we may approximate the inactive
sequences as all sequences. This assumption only holds if there
is a low number of active sequences relative to the size of the sequence
diversity. This is often the case but must be observed during the
experiment. With this assumption, p-values may be calculated for each
descriptor. These p-values do not assume normality and are a measure of
the overlap between the active sequences and inactive sequences. They
are calculated using a Wilcox t-test. A low p-value is considered
significant and such a desciptor may be considerd to be related to
activity. Remember that a descriptor may be important in
connection to a motif. Thus it is important to do both descriptors and
motif discovery. include.staistics will calculate the p-values
for each of the descriptors. This is only practical for smaller lengths;
less than 10.
If base.matrix is NA, then the default will be used,
defaultBaseMatrix. See the documentation on that dataset
for more information.
Extends
Class "data.frame", directly.
Class "list", by class "data.frame", distance 2.
Class "oldClass", by class "data.frame", distance 2.
Class "vector", by class "data.frame", distance 3.