poolify: Create a (pseudo-)pooled sample from a set of samples (using random-allele sampling)

Description

Create a (pseudo-)pooled sample from a set of samples (using random-allele sampling)

Usage

poolify(x, sample.index = NULL, out.samplename = "PoolSample")

Value

A `countdata` object containing the "poolified" sample.

Arguments

x: Pooldata or countdata object containing samples to pool
sample.index: Indexes of the pools or pops (at least two), that should be selected to create the pool sample (default: all)
out.samplename: Name of the poolified sample (default: PoolSample)

Details

This function generates a `countdata` with a single sample combining read or allele counts from different samples stored in either a `pooldata` or a `countdata` object. To avoid introducing bias in downstream analyses, different strategies are applied depending on the type of input and the desired output:

1. The input is a `countdata` object (allele counts): The output consists of the sum of allele counts across the selected samples (specified via `sample.index`). The resulting "poolified" `countdata` object can be used to simulate Pool-Seq data using the function sim.readcounts or directly merged with some other samples using data.merge.

2. The input is a `pooldata` object (read counts): A random allele approach is used. For each SNP, one read is randomly sampled per sample to be pooled. This downsampling strategy—although it reduces information—ensures that all reads in the pooled sample originate from different chromosomes (i.e., no reads come from the same individual chromosome). The resulting `countdata` object can then be merged with other samples stored in either `countdata` or `pooldata` objects using the function data.merge.

Examples

Run this code

 make.example.files(writing.dir=tempdir())
 pooldata=popsync2pooldata(sync.file=paste0(tempdir(),"/ex.sync.gz"),poolsizes=rep(50,15))
 #random allele pooling of sample P2 to P10
 P2toP10pseudopool=poolify(pooldata,sample.index=2:10,out.samplename="P2toP10")
 #merge other sample
 newpooldata=data.merge(pooldata.subset(pooldata,pool.index=c(1,11:15)),P2toP10pseudopool)
 newpooldata
 #Working with allele count data
 #create a countdata object (NOTE: This example is just for the sake of illustration)
 pooldata2genobaypass(pooldata,writing.dir=tempdir())
 countdata=genobaypass2countdata(genobaypass.file=paste0(tempdir(),"/genobaypass")) 
 countdata@snp.info=pooldata@snp.info
 #merge counts (no random allele sampling) from sample P2 to P10
 P2toP10mergecounts=poolify(countdata,sample.index=2:10,out.samplename="P2toP10")
 #merge other sample from the original countdata
 newcountdata=data.merge(countdata.subset(countdata,pop.index=c(1,11:15)),P2toP10mergecounts)
 newcountdata
 #simulate a pool-seq sample from the pooled counts of P2 to P10 sample 
 #and merge it with the original Pool-Seq
 poolP2toP10=sim.readcounts(P2toP10mergecounts,min.rc=0,seq.eps=0)
 #Merge with other PoolSeq sample (First ensure that the same SNPs with the same alleles are merged)
 snp.sel.idx=which(paste0(pooldata@snp.info$Chromosome,pooldata@snp.info$Position)%in%
                   paste0(poolP2toP10@snp.info$Chromosome,poolP2toP10@snp.info$Position))
 newpooldata=data.merge(pooldata.subset(pooldata,pool.index=c(1,11:15),snp.index=snp.sel.idx),
                       poolP2toP10)
 newpooldata