Data Formats: Data Formats

Description

Data formats used in cubfits.

Arguments

Format

All are in simple formats as S3 default lists or data frames.

Details

Format b: A named list A contains amino acids. Each element of the list A[[i]] is a list of elements coefficients (coefficients of log(mu) and Delta.t), coef.mat (matrix format of coefficients), and R (covariance matrix of coefficients). Note that coefficients and R are typically as in the output of vglm() of VGAM package. Also, coef.mat and R may miss in some cases. e.g. A[[i]]$coef.mat is the regression beta matrix of i-th amino acid.
Format bVec: A vector simply contains all coefficients of a b object A. Note that this is probably only used inside MCMC or the output of vglm() of VGAM package. e.g. do.call("c", lapply(A, function(x) x$coefficients)).
Format n: A named list A contains amino acids. Each element of the list A[[i]] is a vector containing total codon counts. e.g. A[[i]][j] is for j-th ORF of i-th amino acid names(A)[i].
Format n.list: A named list A contains ORFs. Each element of the list A[[i]] is a named list of amino acid containing total count. e.g. A[[i]][[j]] contains total count of j-th amino acid in i-th ORF.
Format phi.df: A data frame A contains two columns ORF and phi.value. e.g. A[i,] is for i-th ORF.
Format reu13.df: A named list A contains amino acids. Each element is a data frame summarizing ORF and expression. The data frame has four to five columns including ORF, phi (expression), Pos (amino acid position), Codon (synonymous codon), and Codon.id (synonymous codon id, for computing only). Note that Codon.id may miss in some cases. e.g. A[[i]][17,] is the 17-th recode of i-th amino acid.
Format reu13.list: A named list A contains ORFs. Each element is a named list A[[i]] contains amino acids. Each element of nested list A[[i]][[j]] is a position vector of synonymous codon. e.g. A[[i]][[j]][k] is the k-th synonymous codon position of j-th amino acid in the i-th ORF.
Format scuo: A data frame of 8 named columns includes AA (amino acid), ORF, C1, ..., C6 where C*'s are for codon counts.
Format seq.string: Default outputs of read.fasta() of seqinr package. A named list A contains ORFs. Each element of the list is a long string of a ORF. e.g. A[[i]][1] or A[[i]] is the sequence of i-th ORF.
Format seq.data: Converted from seq.string format. A named list A contains ORFs. Each element of the list A[[i]] is a string vector. Each element of the vector is a codon string. e.g. A[[i]][j] is i-th ORF and j-th codon.
Format phi.Obs: A named vector A of observed expression values and possibly with measurement errors. e.g. A[i] is the observed phi value of i-th ORF.
Format y: A named list A contains amino acids. Each element of the list A[[i]] is a matrix where ORFs are in row and synonymous codons are in column. The element of the matrix contains codon counts. e.g. A[[i]][j, k] is the count for i-th amino acid, j-th ORF, and k-th synonymous codon.
Format y.list: A named list A contains ORFs. Each element of the list A[[i]] is a named list A[[i]][[j]] contains amino acids. The element of amino acids list is a codon count vector. e.g. A[[i]][[j]][k] is the count for i-th ORF, j-th amino acid, and k-th synonymous codon.

References

https://github.com/snoweye/cubfits/