Calculates a score based on rigorous statistics of covariation in a perturbation-based algorithm. It measures how many possible subsets of size n would have the composition found in column j in the subset alignment defined by the perturbation in column i, and in the ideal subset (i.e., in a subset with the amino acid distribution equal to the total alignment).
elsc(
align,
fileHelix= NULL,
diag= 0,
fileCSV= NULL,
gap_val= 0.8,
double_passing= FALSE,
z_score= TRUE
)
An object of class 'align' created by the import.msf or the import.fasta function from a sequence alignment
A string of characters that indicates the file containing the positions of the anchor residues in the sequence alignment. To be used for the analysis of GPCR sequences. Default is NULL.
A numeric value indicating the score of the diagonal elements in the scoring matrix. Default is 0.
A string of characters indicating the name of the csv file where the output matrix will be saved. Default is NULL.
Numeric value indicating the gap ratio at a given position for this position to be taken into account. This value must be between 0 and 0.8. Default is 0.8, which means that positions with more than 80 percent of gaps will not be taken into account.
Boolean to calculate correlation score twice : once from first position to last position then from last to first. Results are summed then divided by 2. DEfault is FALSE.
A boolean for Z-score normalisation of the covariation matrix. Default is TRUE.
A list of two elements : a matrix containing the ELSC scores for each pair of elements and, optionally, a matrix containing the Z-scores
The ELSC score at position [i,j] has been computed with the following formula :
$$ELSC(i,j) = -ln\prod_{y}^{ } \frac{{{N_{y(j)}}\choose{n_{y(j)}}}}{{{N_{y(j)}}\choose{m_{y(j)}}}}$$
As a reminder, a binomial coefficient \({N}\choose{k}\) is computed as follow : $${{N}\choose{k}} = \frac{N!}{k!(N-k)!}$$
where :
\(N_{y(j)}\) is the number of residues y at position j in the total (unperturbed) sequence alignment
\(n_{y(j)}\) is the number of residues y at position j in the subset alignment defined by the perturbation in column i
\(m_{y(j)}\) is the number of residues y at position j in the ideal subset (i.e., in a subset with the amino acid distribution equal to the total alignment)
Dekker JP, Fodor A, Aldrich RW, Yellen G. A perturbation-bqsed method for calculating explicit likelihood of evolutionary covariance in multiple sequence alignements. Bioinformatics 2004;20:1565-1572.
# NOT RUN {
align <- import.msf(system.file("msa/toy_align.msf", package = "Bios2cor"))
#Creating ELSC object
elsc <- elsc(align)
# }
Run the code above in your browser using DataLab