training: Training dataset used to contruct the profile scoring matrix in the SCORER 2.0 algorithm.
Description
A dataframe containing three columns that must be named "type", "sequence" and "register".
The order of the columns in the dataframe does not matter
- column "type": contains the known oligomeric state
of the coiled-coil sequences in the training data. Acceptable oligomeric states
are "DIMER" and "TRIMER" only.
- column "sequence": contains the amino-acid sequences of the coiled coils in the
training data. Valid characters are all uppercase letters except B,
J, O, U, X, and
Z; invalid characters will not be tolerated and their use will result
in a failure of the program.
- Contains the register assignments specific to each coiled-coil sequence in the
training data. As such, it must always have the same length as the matching amino-acid sequence in the "sequence" column. Valid characters are the lowercase letters a to
g only. Register assignments are not required to be in proper order and may
start with any of the seven letters.
Format
A multi-dimensional array with 7 element, each of dimension 2x21.Source
DOI: 10.1093/bioinformatics/btr299.References
Craig T. Armstrong, Thomas L. Vincent, Peter J. Green and Dek N. Woolfson.
(2011) SCORER 2.0: an algortihm for distinguishing parallel dimeric and trimeric
coiled-coil sequences. Bioinformatics.
DOI: 10.1093/bioinformatics/btr299
Examples
Run this code data(training)
print(training)
Run the code above in your browser using DataLab