haplosim: Haplotype data simulator

Description

Generates a random sample of haplotypes, given an haplotype object created from a data set

Usage

haplosim(n, hap, which.snp = NULL, seed = NULL, force.polym = TRUE, summary = TRUE)

Arguments

Number of haplotypes to generate

hap

Haplotype object created with haplodata

which.snp

A vector specifying which SNPs to include

seed

Seed for the random number generator

force.polym

if TRUE, all loci are polymorphic

summary

if TRUE, additional summary statistics are returned

Value

A list containing:

data

Simulated sample

freqs

Allele frequency vector

cor

Correlation matrix

div

Locus-specific diversity scores

mse.freqs

MSE of allele frequencies

mse.cor

MSE of correlations

References

Montana, G. HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. 2005.

Examples

Run this code


#
# Example 1
#

data(ACEdata)

# create the haplotype object 
x <- haplodata(ACEdata) 

# simulates a first sample of 100 haplotypes using all markers
y1 <- haplosim(100, x) 

# compares allele frequencies in real and simulated samples
plot(x$freqs, y1$freqs, title=paste("MSE:",y1$mse.freqs)); abline(a=0, b=1)

# compares LD coefficients in real and simulated samples
ldplot(mergemats(x$cor, y1$cor), ld.type='r') 

# simulates a second sample of 1000 haplotypes using the first 20 markers only
y2 <- haplosim(1000, which.snp=seq(20), x) 

#
# Example 2
#

# simulate a sample of 500 haplotypes based on the ACE data set
set.seed(100)
data(ACEdata)
n <- 500
x <- haplodata(ACEdata)
y <- haplosim(n, x)

# compute the haplotype frequencies
# an haplotype starts at markers 17 and ends at marker 22
freq1 <- haplofreqs(ACEdata, 17, 22)
freq2 <- haplofreqs(y$data, 17, 22)

# extract the set of haplotypic configurations that are shared
# by real and simulated data and their frequencies
commonhapls <- intersect(names(freq1),names(freq2)) 
cfreq1 <- freq1[commonhapls]
cfreq2 <- freq2[commonhapls]

# compare real vs simulated haplotype frequencies
par(mar=c(10.1, 4.1, 4.1, 2.1), xpd=TRUE)
legend.text <- names(cfreq1)
bp <- barplot(cbind(cfreq1,cfreq2), main="Haplotype Frequencies",
       names.arg=c("Real","Simulated"), col=heat.colors(length(legend.text)))
legend(mean(range(bp)), -0.3, legend.text, xjust = 0.5,
       fill=heat.colors(length(legend.text)), horiz = TRUE)
chisq.test(x=n*cfreq2, p=cfreq1, simulate.p.value = TRUE, rescale.p = TRUE)