Learn R Programming

Bios2cor (version 1.2)

bios2cor-package: Correlation Analysis in Biological Sequences and Simulations

Description

Bios2cor is dedicated to the computation and analysis of correlation/co-variation between positions in multiple sequence alignments and between side chain dihedral angles during molecular dynamics simulations. Features include the ability to compute correlation/co-variation using a variety of scoring functions and to analyze the correlation/co-variation matrix through a variety of tools including network representation and principal components analysis. In addition, several utility functions are based on the R graphical environment to provide friendly tools for help in data interpretation.

The main functionalities of Bios2cor are summarized below:

(1) CORRELATION/COVARIATION METHODS :

Methods that can be used to analyze sequence alignments nad molecular simulations and to calculate a correlation matrix containing a score for each pair of positions (sequence alignment) or each pair of dihedral angles (molecular simulations).

Methods working with sequence alignments (fasta or msf file is required):

  • omes: calculates the difference between the observed and expected occurrences of each possible pair of amino acids (x, y) at positions i and j of the alignment.

  • mip: calculates a score (MI) based on the probability of joint occurrence of events and correct it with the average product correction which is subtracted from the MI score.

  • elsc: calculates a score based on rigorous statistics of covariation in a perturbation-based algorithm. It measures how many possible subsets of size n would have the composition found in column j.

  • mcbasc: relies on a substitution matrix giving a similarity score for each pair of amino acids.

Methods working with molecular simulations (pdb and dcd files are required) :

  • rotamer_circular: calculates a correlation score based on a circular version of the Pearson correlation coefficient, between each pair of side chain dihedral angles in a trajectory obtained from molecular dynamics simulations

  • rotamer_omes: calculates the difference between the observed and expected occurrences of each possible pair of rotamers (x, y) occuring at side chain dihedral angles i and j in a trajectory

  • rotamer_mip: calculates a score (MI) based on the probability of joint occurrence of rotameric states and correct it with the average product which is subtracted from this MI score.

The methods working with molecular simulations require the following functions :

  • dynamic_struct: using the result of the xyz2torsion function, creates a unique structure that contains side chain dihedral angle informations for each selected frame of the trajectory

  • angle_conversion: using the result of the dynamic_struct function, creates a structure that associates rotameric state to each side chain dihedral angle for each selected frame of the trajectory.

(2) ADDITIONNAL FUNCTIONS :

Functions that can be used to analyse the results of the correlation methods :

Entropy functions :

  • entropy: calculates an entropy score for each position of the alignment. This score is based on sequence conservation and uses a formula derived from the Shannon's entropy.

  • rotamer_entropy: calculates a "dynamic entropy" score for each side chain dihedral angle of a protein during molecular simulations. This score is based on the number of rotameric changes of the dihedral angle during the simulation.

Filters:

  • gauss_weighting: given an entropy object, returns a "gaussian weighting" for each position of the alignment or each side chain dihedral angle of the protein.

  • sigmoid_weighting: given an entropy object, returns a "sigmoidal weighting" for each position of the alignment or each side chain dihedral angle of the protein.

  • delta_weighting: given an entropy object, returns a "delta weighting" for each position of the alignment or each side chain dihedral angle of the protein.

PCA :

  • centered_pca: returns a principal component analysis of the correlation matrix passed as a parameter. A weighting filter can be precised, using the weighting functions described above.

(3) OUTPUT FILES :

Functions that can be used to produce output files (txt/csv and graphs)

Some data structures can be stored in txt/csv files :

  • create_corrfile: Using the result of a correlation/covariation method, creates a file containing the score of each pair of positions (sequence alignment analysis) or of side chain dihedral angles (molecular simulations)

  • corr_contact: Using the result of a correlation/covariation method and an integer X, creates a file containing top positions/dihedral angles and their contact counts (number of times the position/dihedral angle appears in the top X pairs with highest scores

  • create_pcafile: Using the result of the centered_pca function, creates a file that contains the coordinates of each position or of each dihedral angle in the principal components

  • write_pdb: Using the result of the centered_pca function, creates a pdb file with the PCA coordinates on three principal components along with a pml file for nice visualization with Pymol

Some data can be analyzed thanks to special graphs :

  • create_boxplot: Using the result of a correlation/covariation method, creates a boxplot to visualize the distribution of the Z-scores

  • create_network: Using the result of the corr_contact function, creates the graph of a network representation of the data with links between correlated/covarying elements

  • entropy_graph: Using the result of a correlation/covariation method and an entropy structure, creates a graph comparing correlation scores with entropy values. Each pair of elements (i,j) is placed in the graph with (entropy[i] ; entropy[j]) as coordinates. The color code of each point is based on its correlation score (red/pink color for top values, blue/skyblue for bottom values).

  • create_screeplot: Using the result of the centered_pca function, creates the graph of the eigen values (positive values only)

  • pca_2d: Using the result of the centered_pca function, creates a graph with the projection of the elements on two selected components

  • ang_evo_graph: Using pdb and dcd files and the result of a correlation/covariation method, creates graphs to monitor the time evolution of each dihedral angle in the top X pairs

Arguments

Details

Package: BioCor
Type: Package
Version: 1.5
Date: 2017-03-06
License: GPL

Examples

Run this code
# NOT RUN {
  msf <- system.file("msa/toy_align.msf", package = "Bios2cor")
  align <- import.msf(msf)

  #Creating OMES object
  omes <- omes(align,fileHelix= NULL , diag= 0, fileCSV= NULL, gap_val= 0.8, z_score= TRUE)

  #Creating ENTROPY object
  entropy <- entropy(align)

  #Creating weighting filter
  filter <- gauss_weighting(entropy, L= 0.1)
  
  # Creating PCA structures for OMES method and storing in txt file
  omes <-omes$normalized
  pca <- centered_pca(omes, m= filter, pc= NULL, dec_val= 5,eigenvalues_csv= NULL)
  
  
# }

Run the code above in your browser using DataLab