The DNA prediction uses a sliding pentamer window where structural features
unique to each of the 512 distinct pentamers define a vector of minor
groove width (MGW), Roll, propeller twist (ProT), and helix twist (HelT) at
each nucleotide position (Zhou, et al., 2013). MGW and ProT define
base-pair parameter whereas Roll and HelT represent base pair-step
parameters. The values for each DNA shape feature as function of its
pentamer sequence were derived from all-atom Monte Carlo simulations where
DNA structure is sampled in collective and internal degrees of freedom in
combination with explicit counter ions (Zhang, et al., 2014). The Monte
Carlo simulations were analyzed with a modified Curves approach
(Zhou, et al., 2013). Through data mining, average values for each shape
feature were calculated for the on average 44 occurrences of each pentamer
in an ensemble of Monte Carlo trajectories for 2,121 DNA fragments of 12-27
base pairs in length. DNAshapeR predicts four DNA shape features, which were
observed in various co-crystal structures playing an important role in
specific protein-DNA binding. The core prediction algorithm enables
ultra-fast, high-throughput predictions of shape features for thousands of
genomic sequences and is implemented in C++. Since it is likely that
features describing additional structural properties or equivalent features
derived from different experimental or computational sources will become
available, the package has a flexible modular design that easily allows
future expansions.