hapFabia (version 1.14.0)

simulateIBDsegments: Generates simulated genotyping data with IBD segments

Description

simulateIBDsegments: R implementation of simulateIBDsegments.

Genotype data with rare variants is simulated. Into these datan IBD segments are implanted. All data sets and information are written to files.

Usage

simulateIBDsegments(fileprefix="dataSim",minruns=1, maxruns=100,snvs=10000,individualsN=100,avDistSnvs=100, avDistMinor=25,noImplanted=1,implanted=10,length=100, minors=20,mismatches=0,mismatchImplanted=0.5,overlap=50, noOverwrite=FALSE)

Arguments

fileprefix
prefix of file names containing data generated in this simulation.
minruns
start index for generating multiple data sets.
maxruns
end index for generating multiple data sets.
snvs
number of SNVs in this simulation.
individualsN
number of individuals in this simulation.
avDistSnvs
average genomic distance in bases between SNVs.
avDistMinor
average distance between minor alleles, thus 1/avDistMinor is the average minor allele frequency (MAF).
noImplanted
number of IBD segments that are implanted.
implanted
number of individuals belonging to specific IBD segment.
length
length of the IBD segments in number of SNVs.
minors
number of tagSNVs for each IBD segment.
mismatches
number of minor allele tagSNV mismatches for individuals belonging to the IBD segment.
mismatchImplanted
percentage of individuals of an IBD segment that have mismatches.
overlap
minimal overlap of the founder interval between individuals belonging to a specific IBD segment (the interval may be broken at the ends).
noOverwrite
noOverwrite=TRUE ensures that an IBD segment is not superimposed by another IBD segment.

Details

Data simulations focuses on rare variants but common variants are possible, too. Linkage disequilibrium and haplotype blocks are not simulated except by implanting IBD segments.

Simulated data is written to files. For BEAGLE the data is written to "...beagle.txt". For PLINK the data is written to "...plink.ped", "...plink.map", and "...plink.fam". For the MCMC method the data is written to "...mcmc.genotype", "...mcmc.posmaf", and "...mcmc.initz". For RELATE the data is written to "...relate.geno", "...relate.pos", and "...relate.chr". For fabia the data is written to "...fabia_individuals.txt", "...fabia_annot.txt" "...fabia_mat.txt".

Information on parameters for data simulation is written to "...Parameters.txt" while information on implanted IBD segments is written to "...Impl.txt".

Most information is also written in R binary ".Rda" files.

Implementation in R.

References

S. Hochreiter et al., ‘FABIA: Factor Analysis for Bicluster Acquisition’, Bioinformatics 26(12):1520-1527, 2010.

See Also

IBDsegment-class, IBDsegmentList-class, analyzeIBDsegments, compareIBDsegmentLists, extractIBDsegments, findDenseRegions, hapFabia, hapFabiaVersion, hapRes, chr1ASW1000G, IBDsegmentList2excel, identifyDuplicates, iterateIntervals, makePipelineFile, matrixPlot, mergeIBDsegmentLists, mergedIBDsegmentList, plotIBDsegment, res, setAnnotation, setStatistics, sim, simu, simulateIBDsegmentsFabia, simulateIBDsegments, split_sparse_matrix, toolsFactorizationClass, vcftoFABIA

Examples

Run this code
## Not run: 
# old_dir <- getwd()
# setwd(tempdir())
# 
# simulateIBDsegments(minruns=1,maxruns=1,snvs=1000,individualsN=10,avDistSnvs=100,avDistMinor=15,noImplanted=1,implanted=10,length=100,minors=10,mismatches=0,mismatchImplanted=0.5,overlap=50,noOverwrite=FALSE) 
# 
# setwd(old_dir)
# 
# ## End(Not run)

Run the code above in your browser using DataCamp Workspace