In addition to reducing the size of the data, the argument keep_maf has practicable applicability. In family-based studies, common SNVs are generally filtered out prior to analysis. Users who intend to study common variants in addition to rare variants may need to run chromosome specific analyses to allow for allocation of large data sets in R.
The argument recomb_map is used to remap mutations to their actual locations and chromosomes. This is necessary when data has been simulated over non-contiguous regions such as exon-only data. If create_slimMap was used to create the recombination map for SLiM, simply supply the output of create_slimMap to recomb_map. If recomb_map is not provided we assume that the SNV data has been simulated over a contiguous segment starting with the first base pair on chromosome 1.
The data frame pathway_df allows users to identify SNVs located within a pathway of interest. When supplied, we expect that pathwayDF does not contain any overlapping segments. All overlapping exons in pathway_df MUST be combined into a single observation. Users may combine overlapping exons with the combine_exons function.
When TRUE, the logical argument recode_recurrent indicates that recurrent SNVs should be recorded as a single observation. SLiM can model many types of mutations; e.g. neutral, beneficial, and deleterious mutations. When different types of mutations occur at the same position carriers will experience different fitness effects depending on the carried mutation. However, when mutations at the same location have the same fitness effects, they represent a recurrent mutation. Even so, SLiM stores recurrent mutations separately and calculates their prevalence independently. When the argument recode_recurrent = TRUE we store recurrent mutations as a single observation and calculate the derived allele frequency based on their combined prevalence. This convention allows for both reduction in storage and correct estimation of the derived allele frequency of the mutation. Users who prefer to store recurrent mutations from independent lineages as unique entries should set recode_recurrent = FALSE.
An object of class SNVdata, which inherits from a list and contains:
The read_slim function returns an object of class SNVdata, which inherits from a list and contains the following two items:
Haplotypes A sparse matrix of class dgCMatrix (see dgCMatrix-class). The columns in Haplotypes represent distinct SNVs, while the rows represent individual haplotypes. We note that this matrix contains two rows of data for each diploid individual in the population: one row for the maternally ihnherited haplotype and the other for the paternally inherited haplotype.
Mutations A data frame cataloging SNVs in Haplotypes. The variables in the Mutations data set are described as follows:
colIDAssociates the rows, i.e. SNVs, in Mutations to the columns of Haplotypes.
chromThe chromosome that the SNV resides on.
positionThe position of the SNV in base pairs.
afreqThe derived allele frequency of the SNV.
markerA unique character identifier for the SNV.
typeThe mutation type, as specified in the user's slim simulation.
pathwaySNVIdentifies SNVs located within the pathway of interest as TRUE.
Please note: the variable pathwaySNV will be omitted when pathway_df is not supplied to read_slim.