This function reads in the content from a genetic map file to translate physical distance to genetic units (i.e. cM). Regardless of the source, the input file must be sex-averaged and in a tab-separated "Plink" format (documentation) with the following four columns and no header (i.e. no column names):
Chromosome
Identifier (ignored in read_map()
)
Length (genetic length within the physical position boundary)
Position (physical position boundary)
The columns must be in the order above. Note that only the first, third, and fourth columns are used in the function.
read_map(file)
A tibble containing 3 columns:
chr (chromosome)
value (genetic length within the physical position boundary)
bp (physical position boundary)
Input file path
The genetic map could come from different sources. One source is the HapMap map distributed by the Browning Lab (documentation). If this map file is used, the non-sex chromosomes can be downloaded and concatenated to a single file as follows:
wget https://bochet.gcc.biostat.washington.edu/beagle/genetic_maps/plink.GRCh37.map.zip
unzip plink.GRCh37.map.zip
cat *chr[0-9]*GRCh37.map | sort -k1,1 -k4,4 --numeric-sort > plink.allchr.GRCh37.map
Another source is a sex-specific map ("bherer") originally published by Bherer et al and recommended by the developers of ped-sim
for simulating IBD segments (documentation). To retrieve and prep this map file for simulation:
# Get the refined genetic map and extract
wget --no-check-certificate https://github.com/cbherer/Bherer_etal_SexualDimorphismRecombination/raw/master/Refined_genetic_map_b37.tar.gz
tar xvfpz Refined_genetic_map_b37.tar.gz# Format for ped-sim as per https://github.com/williamslab/ped-sim#map-file-
printf "#chr\tpos\tmale_cM\tfemale_cM\n" > sexspec.pedsim.map
for chr in {1..22}; do
paste Refined_genetic_map_b37/male_chr$chr.txt Refined_genetic_map_b37/female_chr$chr.txt \
| awk -v OFS="\t" 'NR > 1 && $2 == $6 {print $1,$2,$4,$8}' \
| sed 's/^chr//' >> sexspec.pedsim.map;
done
# Clean up
rm -rf Refined_genetic_map_b37*
After this, the sexspec.pedsim.map
file is ready for use in simulation. However, it must be averaged and reformatted to "Plink format" to use here:
cat sexspec.pedsim.map | grep -v "^#" | awk -v OFS="\t" '{print $1,".",($3+$4)/2,$2}' > sexspec-avg.plink.map
#' The genetic maps created above are in the tens of megabytes size range. This is trivial to store for most systems but a reduced version would increase portability and ease testing. This "minimum viable genetic map" could be used for testing and as installed package data in an R package for example analysis. Read more about minimum viable genetic maps at:
Blog post: https://hapi-dna.org/2020/11/minimal-viable-genetic-maps/
Github repo with python code: https://github.com/williamslab/min_map
The code as written below reduces the averaged sex-specific genetic map from 833776 to 28726 positions (~30X reduction!).
# Get minmap script from github
wget https://raw.githubusercontent.com/williamslab/min_map/main/min_map.py# Create empty minmap
echo -n > sexspec-avg-min.plink.map
# For each autosome...
for chr in {1..22}; do
echo "Working on chromosome $chr..."
# First pull out just one chromosome
grep "^${chr}[[:space:]]" sexspec-avg.plink.map > tmp.${chr}
# Run the python script on that chromosome.
# The genetic map column is 3rd column (2nd in 0-start). Physical position is last column (3 in 0-based)
python3 min_map.py -mapfile tmp.${chr} -chr ${chr} -genetcol 2 -physcol 3 -noheader -error 0.05
# Strip out the header and reformat back to plink format, and append to minmap file
cat min_viable_map${chr}.txt | grep -v "^#" | awk -v OFS="\t" '{print $1,".",$4,$2}' >> sexspec-avg-min.plink.map
# Clean up
rm -f min_viable_map${chr}.txt tmp.${chr}
done
This averaged version of the Bherer sex-specific map, reduced to a minimum viable genetic map with at most 5% error, in Plink format, is available as installed package data (see examples). This is useful for testing code, but the full genetic map should be used for most analysis operations.
https://www.cog-genomics.org/plink/1.9/formats#map
https://bochet.gcc.biostat.washington.edu/beagle/genetic_maps/
https://github.com/williamslab/ped-sim#map-file
https://www.nature.com/articles/ncomms14994
https://www.nature.com/articles/ncomms14994
https://github.com/cbherer/Bherer_etal_SexualDimorphismRecombination
gmapfile <- system.file("extdata", "sexspec-avg-min.plink.map", package="skater", mustWork=TRUE)
gmap <- read_map(gmapfile)
Run the code above in your browser using DataLab