list object from mutation and reference data for use with BaySIC fitting and testing functionsbaysic.data(dat, ref.dat, plot = FALSE, N = NULL, silent = TRUE)list of dataframes; ref.dat is a representation of the sequence content of each gene of interest, for 32 unique trinucleotide sequence contexts, yielding an $G\times34$ matrix, where $G$ is the total number of genes. If ref.dat is a matrix, it is assumed that all subjects correspond to the same reference data. It is possible that reference data may vary from subject to subject due to different platforms or coverages. In this case, ref.dat can also be a list of N reference data matrices, where N is the number of subjects. The names of each list element should correspond to ids used in the dat file.TRUE, a plot summarizing the mutation data at an overall and per subject basis is generated. Defaults to FALSE.dat. If N=NULL and is.list(ref.dat)==FALSE, N is assumed to the number of unique subject ids in dat. If is.list(ref.dat)=TRUE, then N=length(ref.dat).FALSE, mutations defined as 'Synonymous' or 'Silent' will be removed from the dataset and subsequent analyses. Defaults to TRUE.list data structure with the following components:datref.datref.datdat is a 7-column matrix similar in style to other popular mutation file formats. The first three columns ("chr","start","end") correspond to the positional information of the somatic mutation. The "id" column represents an identification vector including subject ids for each documented mutation. The "type" column corresponds to the type of mutation for each entry. This is relatively flexible for point mutations, and only requires some form of "silent" or "synonymous" for such mutations if silent=FALSE, but insertion/deletion events should be designated as "INDEL." The "gene" column represents the name of the gene the mutation corresponds to, and must match the gene names used in ref.dat. The "context" entries represent the trinucleotide sequence context of each point mutation (NA for INDELS) The first two columns of the data matrix (or matrices) in ref.dat should correspond to the gene name and corresponding chromosome, and the column names of the remaining 32 columns should correspond to the trinucleotide motif (e.g. "ACA"). The sequence content entries should be integer values which correspond to the number of nucleotides in the coding content of a given gene which satisify the trinucleotide motif (central base with flanking 5' and 3' bases). Each base should be uniquely represented, such that the sum of all 32 counts is equivalent to the basepair length of the total coding sequence for a given gene.
The baysic.data function has its own trinucleotide naming convention, in that all motifs are in all caps and have either "T" or "C" as the central base. Column names of ref.dat and "context" entries in dat will be adjusted to accommodate this convention if they deviate from it.
baysic.fit,baysic.test
## Not run:
# data(example.dat)
# data(ccds.19)
# baysic.dat.ex<-baysic.data(example.dat,ccds.19)
# ## End(Not run)
Run the code above in your browser using DataLab