data.merge: Merge two pooldata or countdata objects

Description

Merge two pooldata or countdata objects

Usage

data.merge(x1, x2, fake.pool.size = 1e+06, verbose = TRUE)

Value

A new `pooldata` or `countdata` object, depending on the input types.

Arguments

x1: First pooldata or countdata object to merge
x2: Second pooldata or countdata object to merge
fake.pool.size: Specifies the haploid sample size used when merging a `countdata` object with a `pooldata` object to create a pseudo pooldata object containing all samples (default = 1e6), see details.
verbose: If TRUE return some information

Details

This function merges two objects of class `pooldata` and/or `countdata`, automatically checking their structure for consistency. The merging behavior depends on the relationship between sample names and SNP identifiers:

1. Merging different samples (same SNPs): If SNP names are identical but (pool or population) sample names differ, the function merges data from the distinct samples into a single `pooldata` or `countdata` object that includes all samples.

2. Merging different SNPs (same sample): If sample names are identical but SNP names differ, the SNP data from each object are merged for each shared sample, effectively combining the variant information into one object.

3. Merging a `countdata` object with a `pooldata` object: In this case, the function returns a `pooldata` object. Allele counts from the `countdata` object are converted into pseudo read counts. To ensure compatibility, the haploid sample size for the sample originally contained in the `countdata` object is set to the value specified by the `fake.haploid.size` argument (default = 1e6). Setting this value to a very large number (as in the default) ensures that each read count is treated as originating from a distinct haploid individual— mimicking Pool-Seq data where read coverage is much lower than the haploid sample size. This effectively disables Pool-Seq-specific bias corrections in downstream statistical analyses. Importantly, when merging objects of different types, only SNP-level merging is permitted. In this context, population samples are indeed expected to be necessarily distinct (at least in terms of effective haploid sample sizes).

Examples

Run this code

 make.example.files(writing.dir=tempdir())
 pooldata1=popsync2pooldata(sync.file=paste0(tempdir(),"/ex.sync.gz"),poolsizes=rep(50,15))
 pooldata2=pooldata1
 #Merge pooldata1 and pooldata2 by SNP
 pooldata2@poolnames=paste0(pooldata2@poolnames,"_2") #pool names must be different
 data.merged=data.merge(pooldata1,pooldata2)
 #Merge pooldata1 and pooldata2 by POP
 pooldata2=pooldata1
 pooldata2@snp.info[,1]=paste0(pooldata2@snp.info[,1],"_2") #SNP info must be different
 data.merged=data.merge(pooldata1,pooldata2)  
 #Merge pooldata1 with a countdata object
 #create a countdata object (NOTE: This example is just for the sake of illustration)
 pooldata2genobaypass(pooldata=pooldata1,writing.dir=tempdir())
 countdata=genobaypass2countdata(genobaypass.file=paste0(tempdir(),"/genobaypass")) 
 countdata@snp.info=pooldata1@snp.info
 countdata@popnames=paste0(countdata@popnames,"_2") #pop names must be different
 data.merged=data.merge(pooldata1,countdata)