seqinr (version 3.6-1)

EXP: Vectors of coefficients to compute linear forms.

Description

This dataset is used to compute linear forms on codon frequencies: if codfreq is a vector of codon frequencies then drop(freq %*% EXP$CG3) will return for instance the G+C content in third codon positions. Base order is the lexical order: a, c, g, t (or u).

Usage

data(EXP)

Arguments

Format

List of 24 vectors of coefficients

A

num [1:4] 1 0 0 0

A3

num [1:64] 1 0 0 0 1 0 0 0 1 0 ...

AGZ

num [1:64] 0 0 0 0 0 0 0 0 1 0 ...

ARG

num [1:64] 0 0 0 0 0 0 0 0 1 0 ...

AU3

num [1:64] 1 0 0 1 1 0 0 1 1 0 ...

BC

num [1:64] 0 1 0 0 0 0 0 0 0 0 ...

C

num [1:4] 0 1 0 0

C3

num [1:64] 0 1 0 0 0 1 0 0 0 1 ...

CAI

num [1:64] 0.00 0.00 -1.37 -2.98 -2.58 ...

CG

num [1:4] 0 1 1 0

CG1

num [1:64] 0 0 0 0 0 0 0 0 0 0 ...

CG12

num [1:64] 0 0 0 0 0.5 0.5 0.5 0.5 0.5 0.5 ...

CG2

num [1:64] 0 0 0 0 1 1 1 1 1 1 ...

CG3

num [1:64] 0 1 1 0 0 1 1 0 0 1 ...

CGN

num [1:64] 0 0 0 0 0 0 0 0 0 0 ...

F1

num [1:64] 1.026 0.239 1.026 0.239 -0.097 ...

G

num [1:4] 0 0 1 0

G3

num [1:64] 0 0 1 0 0 0 1 0 0 0 ...

KD

num [1:64] -3.9 -3.5 -3.9 -3.5 -0.7 -0.7 -0.7 -0.7 -4.5 -0.8 ...

Q

num [1:64] 0 0 0 0 1 1 1 1 0 0 ...

QA3

num [1:64] 0 0 0 0 1 0 0 0 0 0 ...

QC3

num [1:64] 0 0 0 0 0 1 0 0 0 0 ...

U

num [1:4] 0 0 0 1

U3

num [1:64] 0 0 0 1 0 0 0 1 0 0 ...

Details

It's better to work directly at the amino-acid level when computing linear forms on amino-acid frequencies so as to have a single coefficient vector. For instance EXP$KD to compute the Kyte and Doolittle hydrophaty index from codon frequencies is valid only for the standard genetic code.

An alternative for drop(freq %*% EXP$CG3) is sum( freq * EXP$CG3 ), but this is less efficient in terms of CPU time. The advantage of the latter, however, is that thanks to recycling rules you can use either sum( freq * EXP$A ) or sum( freq * EXP$A3 ). To do the same with the %*% operator you have to explicit the recycling rule as in drop( freq %*% rep(EXP$A, 16)).

References

citation("seqinr")

A

content in A nucleotide

A3

content in A nucleotide in third position of codon

AGZ

Arg content (aga and agg codons)

ARG

Arg content

AU3

content in A and U nucleotides in third position of codon

BC

Good choice (Bon choix). Gouy M., Gautier C. (1982) codon usage in bacteria : Correlation with gene expressivity. Nucleic Acids Research,10(22):7055-7074.

C

content in C nucleotides

C3

content in A nucleotides in third position of codon

CAI

Codon adaptation index for E. coli. Sharp, P.M., Li, W.-H. (1987) The codon adaptation index - a measure of directionam synonymous codon usage bias, and its potential applications. Nucleic Acids Research,15:1281-1295.

CG

content in G + C nucleotides

CG1

content in G + C nucleotides in first position of codon

CG12

content in G + C nucleotides in first and second position of codon

CG2

content in G + C nucleotides in second position of codon

CG3

content in G + C nucleotides in third position of codon

CGN

content in CGA + CGU + CGA + CGG

F1

From Table 2 in Lobry, J.R., Gautier, C. (1994) Hydrophobicity, expressivity and aromaticity are the major trends of amino-acid usage in 999 Escherichia coli chromosome-encode genes. Nucleic Acids Research,22:3174-3180.

G3

content in G nucleotides in third position of codon

KD

Kyte, J., Doolittle, R.F. (1982) A simple method for displaying the hydropathic character of a protein. J. Mol. Biol.,157 :105-132.

Q

content in quartet

QA3

content in quartet with the A nucleotide in third position

QC3

content in quartet with the A nucleotide in third position

U

content in U nucleotide

U3

content in U nucleotides in third position of codon

Examples

Run this code
# NOT RUN {
data(EXP)
# }

Run the code above in your browser using DataCamp Workspace