Converts input data files to an object of class haplohh.
data2haplohh(hap_file, map_file, min_maf = 0, min_perc_geno.hap = 100,
min_perc_geno.snp = 100, chr.name = NA, popsel = NA,
recode.allele = FALSE, haplotype.in.columns = FALSE)
Path to the file containing haplotype data (see details section below for information about input file format)
Path to the file containing map information (see details section below for information about input file format
Threshold on Minor Allele Frequency (SNPs displaying a MAF lower than min_maf
are discarded)
Threshold on percentage of missing data for haplotypes (Haplotypes with less than min_perc_geno.hap
percent SNPs genotyped are discarded). By default, min_perc_geno.hap
=100, hence only fully genotyped haplotypes are retained
Threshold on percentage of missing data for SNPs (SNPs genotyped on less than min_perc_geno.snp
percent haplotypes are discarded). By default, min_perc_geno.snp
=100, hence only fully genotyped SNPs are retained
Name of the chromosome considered (relevant if several chromosomes are represented in the map file)
Code of the population considered in the fastPHASE output haplotype file (relevant if hap_file
is a fastPHASE output and haplotypes originate from different population)
If TRUE, allele in the haplotypes are recoded according to the map file information. If FALSE a rough verification is performed to check only 0 (code for missing data), 1 (code for ancestral allele) or 2 (code for derived allele) are present in the haplotype file
If TRUE, phased input haplotypes are assumed to be in columns (as produced by the SHAPEIT2 program (O'Connell et al., 2014).
The returned value is an object of class haplohh
Three haplotype input formats are supported:
a standard format with haplotypes in rows and snps in columns (with no header, but a haplotype id)
a (transposed) format similar to the one produced by the phasing program SHAPEIT2 program (O'Connell et al., 2014) in which haplotypes are in columns and snps in rows (with no header and no snp id)
output files from fastPHASE program (Sheet and Stephens, 2006). If the input haplotypes are not in transposed format (i.e., haplotype.in.columns
is FALSE, as by default), the function automatically checks if the file is in fastPHASE output format. In this latter case, if haplotypes from several different population were phased simultaneously (-u fastPHASE option was used), the function ask interactively which population should be considered (a list of population number are proposed) unless specified with the popsel
argument.
The map file contains SNP information in five columns:
SNP name/id
chromosome
position (physical or genetic)
ancestral allele encoding
derived allele encoding
The SNPs must be in the same order as in the haplotype for the chromosome considered. If several chromosomes are represented in the map file, one can provide the name of the chromosome of interest (corresponding to the haplotype under study) with the chr.name
argument. Haplotypes are recoded (if the recode.allele
option is activated) according to the ancestral and derived allele definition available in the map file (fourth and fifth columns) as :0=missing data, 1=ancestral allele, 2=derived allele. If the latter encoding is detected in the haplotype data, no recoding is performed. Note that the cross populations statistics such as Rsb and XP-EHH do not need information about ancestral and derived allele status.
Finally, the arguments min_perc_geno.hap
, min_perc_geno.snp
and min_maf
are evaluated in this order.
Scheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet, 78, 629-644.
O'Connell J, Gurdasani D, Delaneau O, et al (2014) A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet, 10, e1004234.
calc_ehh,calc_ehhs,scan_hh,make.example.files
# NOT RUN {
#Copy example files in the current working directory.
make.example.files()
#using the fastPHASE output haplotype example file
hap<-data2haplohh(hap_file="bta12_hapguess_switch.out",map_file="map.inp",
min_maf=0.05,popsel=7,chr.name=12,recode.allele=TRUE)
#using the standard output haplotype example file
hap<-data2haplohh(hap_file="bta12_cgu.hap",map_file="map.inp",
min_maf=0.05,chr.name=12,recode.allele=TRUE)
# }
Run the code above in your browser using DataLab