In addition to reducing the size of the data, the argument keep_maf
has practicable applicability. In family-based studies, common SNVs are generally filtered out prior to analysis. Users who intend to study common variants in addition to rare variants may need to run chromosome specific analyses to allow for allocation of large data sets in R
.
The argument recomb_map
is used to remap mutations to their actual locations and chromosomes. This is necessary when data has been simulated over non-contiguous regions such as exon-only data. If create_slimMap
was used to create the recombination map for SLiM, simply supply the output of create_slimMap
to recomb_map
. If recomb_map
is not provided we assume that the SNV data has been simulated over a contiguous segment starting with the first base pair on chromosome 1.
The data frame pathway_df
allows users to identify SNVs located within a pathway of interest. When supplied, we expect that pathwayDF
does not contain any overlapping segments. All overlapping exons in pathway_df
MUST be combined into a single observation. Users may combine overlapping exons with the combine_exons
function.
When TRUE
, the logical argument recode_recurrent
indicates that recurrent SNVs should be recorded as a single observation. SLiM can model many types of mutations; e.g. neutral, beneficial, and deleterious mutations. When different types of mutations occur at the same position carriers will experience different fitness effects depending on the carried mutation. However, when mutations at the same location have the same fitness effects, they represent a recurrent mutation. Even so, SLiM stores recurrent mutations separately and calculates their prevalence independently. When the argument recode_recurrent = TRUE
we store recurrent mutations as a single observation and calculate the derived allele frequency based on their combined prevalence. This convention allows for both reduction in storage and correct estimation of the derived allele frequency of the mutation. Users who prefer to store recurrent mutations from independent lineages as unique entries should set recode_recurrent = FALSE
.
An object of class SNVdata
, which inherits from a list
and contains:
The read_slim
function returns an object of class SNVdata
, which inherits from a list
and contains the following two items:
Haplotypes
A sparse matrix of class dgCMatrix (see dgCMatrix-class
). The columns in Haplotypes represent distinct SNVs, while the rows represent individual haplotypes. We note that this matrix contains two rows of data for each diploid individual in the population: one row for the maternally ihnherited haplotype and the other for the paternally inherited haplotype.
Mutations
A data frame cataloging SNVs in Haplotypes
. The variables in the Mutations
data set are described as follows:
colID
Associates the rows, i.e. SNVs, in Mutations
to the columns of Haplotypes
.
chrom
The chromosome that the SNV resides on.
position
The position of the SNV in base pairs.
afreq
The derived allele frequency of the SNV.
marker
A unique character identifier for the SNV.
type
The mutation type, as specified in the user's slim simulation.
pathwaySNV
Identifies SNVs located within the pathway of interest as TRUE
.
Please note: the variable pathwaySNV
will be omitted when pathway_df
is not supplied to read_slim
.