knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
#Introduction VCF (Variant Call Format) is a text file format. It contains meta-information lines, a header line, and then data lines each containing information about a position in the genome. There is an example how to do draw LDheatmap from data in VCF format
1KG_sample_info.csv is the data file which contain the information about the sample population and corresponding super population code. We extract the population code of European descent and Asian descent for later use.
#super population for EUR & EAS #Get the corresponding population code for EUR & EAS sample_info <- read.csv("1KG_sample_info.csv") eur <- sample_info[sample_info$Population %in% c("CEU","TSI","FIN","GBR","IBS"),-c(2,4)] eas <- sample_info[sample_info$Population %in% c("CHB","JPT","CHS","CDX","KHV"),-c(2,4)]
snp_in_vcf.vcf is a vcf datafile contains common SNPs (SNPs with frequency 5% or more in the world-wide population) in the MLLT3 gene. We are going to draw 2 different LDheatmaps based on the genotype data contained in the file and the population code we just extracted.
library(vcfR) #read the VCF file and store the vcfR object vcf <- read.vcfR("snp_in_vcf.vcf") library(LDheatmap) #extract genetic distances, subject IDs and genotypes for European desent from the vcfR object list_eur <- vcfR2SnpMatrix(vcf, subjects = eur[,1]) #draw the heatmap LDheatmap(list_eur$data, list_eur$genetic.distance, title='Europeans', add.map=FALSE) #same procedure as above for Asian descent list_eas <- vcfR2SnpMatrix(vcf, subjects = eas[,1]) LDheatmap(list_eas$data, list_eas$genetic.distance, title='Asians', add.map=FALSE)