Learn R Programming

methylKit

#Introduction methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods such as Agilent SureSelect methyl-seq. In addition, methylKit can deal with base-pair resolution data for 5hmC obtained from Tab-seq or oxBS-seq. It can also handle whole-genome bisulfite sequencing data if proper input format is provided.

##Current Features

  • Coverage statistics
  • Methylation statistics
  • Sample correlation and clustering
  • Differential methylation analysis
  • Feature annotation and accessor/coercion functions
  • Multiple visualization options
  • Regional and tiling windows analysis
  • (Almost) proper documentation
  • Reading methylation calls directly from Bismark(Bowtie/Bowtie2 alignment files
  • Batch effect control
  • Multithreading support (for faster differential methylation calculations)
  • Coercion to objects from Bioconductor package GenomicRanges
  • Reading methylation percentage data from generic text files

##Staying up-to-date You can subscribe to our googlegroups page to get the latest information about new releases and features (low-frequency, only updates are posted)

To ask questions please use methylKit_discussion forum

You can also check out the blogposts we make on using methylKit


##Installation

in R console,

library(devtools)
install_github("al2na/methylKit", build_vignettes=FALSE, 
  repos=BiocInstaller::biocinstallRepos(),
  dependencies=TRUE)

if this doesn't work, you might need to add type="source" argument.

Install the development version

library(devtools)
install_github("al2na/methylKit", build_vignettes=FALSE, 
  repos=BiocInstaller::biocinstallRepos(),ref="development",
  dependencies=TRUE)

if this doesn't work, you might need to add type="source" argument.


#How to Use Typically, bisulfite converted reads are aligned to the genome and % methylation value per base is calculated by processing alignments. methylKit takes that % methylation value per base information as input. Such input file may be obtained from AMP pipeline for aligning RRBS reads. A typical input file looks like this:

chrBase	chr	base	strand	coverage	freqC	freqT
chr21.9764539	chr21	9764539	R	12	25.00	75.00
chr21.9764513	chr21	9764513	R	12	0.00	100.00
chr21.9820622	chr21	9820622	F	13	0.00	100.00
chr21.9837545	chr21	9837545	F	11	0.00	100.00
chr21.9849022	chr21	9849022	F	124	72.58	27.42
chr21.9853326	chr21	9853326	F	17	70.59	29.41

methylKit reads in those files and performs basic statistical analysis and annotation for differentially methylated regions/bases. Also a tab separated text file with a generic format can be read in, such as methylation ratio files from BSMAP, see here for an example. Alternatively, read.bismark function can read SAM file(s) output by Bismark(using bowtie/bowtie2) aligner (the SAM file must be sorted based on chromosome and read start). The sorting must be done by unix sort or samtools, sorting using other tools may change the column order of the SAM file and that will cause an error.

Below, there are several options showing how to do basic analysis with methylKit.

##Documentation##

  • You can look at the vignette here
  • You can check out the slides for a tutorial at EpiWorkshop 2013
  • You can check out the tutorial prepared for EpiWorkshop 2012
  • You can see the code snippet below

##Example analysis##

###Read methylation files###

library(methylKit)


# this is a list of example files, ships with the package
# for your own analysis you will just need to provide set of paths to files
# you will not need the "system.file(..."  part
file.list=list( system.file("extdata", "test1.myCpG.txt", package = "methylKit"),
                system.file("extdata", "test2.myCpG.txt", package = "methylKit"),
                system.file("extdata", "control1.myCpG.txt", package = "methylKit"),
                system.file("extdata", "control2.myCpG.txt", package = "methylKit") )


# read the files to a methylRawList object: myobj
myobj=read( file.list,
           sample.id=list("test1","test2","ctrl1","ctrl2"),assembly="hg18",treatment=c(1,1,0,0))

###Get descriptive stats on methylation###

# get methylation statistics on second file "test2" in myobj which is a class of methylRawList
getMethylationStats(myobj[[2]],plot=F,both.strands=F)

# plot methylation statistics on second file "test2" in myobj which is a class of methylRawList
getMethylationStats(myobj[[2]],plot=T,both.strands=F)

###Get bases covered by all samples and cluster samples###

# see what the data looks like for sample 2 in myobj methylRawList
head(myobj[[2]])

# merge all samples to one table by using base-pair locations that are covered in all samples
# setting destrand=TRUE, will merge reads on both strans of a CpG dinucleotide. This provides better 
# coverage, but only advised when looking at CpG methylation
meth=unite(myobj,destrand=TRUE)

# merge all samples to one table by using base-pair locations that are covered in all samples
# 
meth=unite(myobj)

# cluster all samples using correlation distance and return a tree object for plclust
hc = clusterSamples(meth, dist="correlation", method="ward", plot=FALSE)

# cluster all samples using correlation distance and plot hiarachical clustering
clusterSamples(meth, dist="correlation", method="ward", plot=TRUE)

# screeplot of principal component analysis.
PCASamples(meth, screeplot=TRUE)

# principal component anlaysis of all samples.
PCASamples(meth)

###Calculate differential methylation### Before differential methylation calculation, consider filtering high coverage bases to remove potential PCR bias using filterByCoverage(). In addition, consider normalizing read coverages between samples to avoid bias introduced by systematically more sequenced samples, using normalizeCoverage().

# calculate differential methylation p-values and q-values
myDiff=calculateDiffMeth(meth)

# check how data part of methylDiff object looks like
head( myDiff )

# get differentially methylated regions with 25% difference and qvalue<0.01
myDiff25p=get.methylDiff(myDiff,difference=25,qvalue=0.01)

# get differentially hypo methylated regions with 25% difference and qvalue<0.01
myDiff25pHypo =get.methylDiff(myDiff,difference=25,qvalue=0.01,type="hypo") 

# get differentially hyper methylated regions with 25% difference and qvalue<0.01
myDiff25pHyper=get.methylDiff(myDiff,difference=25,qvalue=0.01,type="hyper")

###Annotate differentially methylated bases/regions###

# read-in transcript locations to be used in annotation
# IMPORTANT: annotation files that come with the package (version >=0.5) are a subset of full annotation
# files. Download appropriate annotation files from UCSC (or other sources) in BED format
gene.obj=read.transcript.features(system.file("extdata", "refseq.hg18.bed.txt", package = "methylKit"))

# annotate differentially methylated Cs with promoter/exon/intron using annotation data
annotate.WithGenicParts(myDiff25p,gene.obj)

SEE PACKAGE VIGNETTE and TUTORIAL (both hyper-linked above) FOR MORE

##Downloading Annotation Files## You can download annotation files from UCSC table browser for your genome of interest. Go to [http://genome.ucsc.edu/cgi-bin/hgGateway]. On the top menu click on "tools" then "table browser". Select your "genome" of interest and "assembly" of interest from the drop down menus. Make sure you select the correct genome and assembly. Selecting wrong genome and/or assembly will return unintelligible results in downstream analysis.

From here on you can either download gene annotation or CpG island annotation.

  1. For gene annotation, select "Genes and Gene prediction tracks" from the "group" drop-down menu. Following that, select "Refseq Genes" from the "track" drop-down menu. Select "BED- browser extensible data" for the "output format". Click "get output" and on the following page click "get BED" without changing any options. save the output as a text file.
  2. For CpG island annotation, select "Regulation" from the "group" drop-down menu. Following that, select "CpG islands" from the "track" drop-down menu. Select "BED- browser extensible data" for the "output format". Click "get output" and on the following page click "get BED" without changing any options. save the output as a text file.

In addition, you can check this tutorial to learn how to download any track from UCSC in BED format (http://www.openhelix.com/cgi/tutorialInfo.cgi?id=28)


#R script for Genome Biology publication The most recent version of the R script in the Genome Biology manuscript is here.


#Citing methylKit If you used methylKit please cite:

and also consider citing the following publication as a use-case with specific cutoffs:


#Contact & Questions e-mail to methylkit_discussion@googlegroups.com or post a question using the web interface.

if you are going to submit bug reports or ask questions, please send sessionInfo() output from R console as well.

Questions are very welcome, although we suggest you read the paper, documentation(function help pages and the vignette) and blog entries first. The answer to your question might be there already.


#Contribute to the development See the trello board for methylKit development. You can contribute to the methylKit development via github ([http://github.com/al2na/methylKit/]) by checking out "development" branch, making your changes and doing a pull request (all of these should be done on the "development" branch NOT on the "master" branch). In addition, you should:

  • Bump up the version in the DESCRIPTION file on the 4th number. For example, the master branch has the version numbering as in "X.Y.Z". If you make a change to the development branch you should bump up the version in the DESCRIPTION file to "X.Y.Z.1". If there are already changes done in the development just bump up the fourth number.

  • Add your changes to the NEWS file as well under the correct version and appropriate section. Attribute the changes to yourself, such as "Contributed by X"

License

Artistic License/GPL

Copy Link

Version

Version

0.99.2

License

Artistic-2.0

Maintainer

Last Published

January 22nd, 2018

Functions in methylKit (0.99.2)

adjustMethylC

Adjust measured 5mC levels using 5hmC levels
bedgraph

Get bedgraph from methylRaw, methylRawList and methylDiff objects
calculateDiffMethDSS

calculate Differential Methylation with DSS
annotationByFeature-class

An S4 class for overlap of target features with a generic annotation
getAssociationWithTSS

Get distance to nearest TSS and gene id from annotationByGenicParts
calculateDiffMeth

Calculate differential methylation statistics
annotateWithFeature

Annotate object with a set of genomic features
annotateWithGenicParts

Annotate given object with gene annotation
annotationByGenicParts-class

An S4 class for overlap of target features with gene annotation
annotateWithFeatureFlank

Annotate an object with two sets of genomic features
filterByCoverage

Filter methylRaw, methylRawDB, methylRawList and methylRawListDB object based on read coverage
dataSim

Simulate DNA methylation data
getData

get the data slot from the methylKit objects
diffMethPerChr

Get and plot the number of hyper/hypo methylated regions/bases per chromosome
getContext

get the context of methylation
getCorrelation

get correlation between samples in methylBase or methylBaseDB object
extract

extract parts of methylRaw,methylRawDB,methylBase,methylBaseDB and methylDiff data
getAssembly

get assembly of the genome
clusterSamples

Hierarchical Clustering using methylation data The function clusters samples using hclust function and various distance metrics derived from percent methylation per base or per region for each sample.
getCoverageStats

get coverage stats from methylRaw object
getFeatsWithTargetsStats

Get the percentage/count of annotation features overlapping with target features
getMethylationStats

get Methylation stats from methylRaw or methylRawDB object
getDBPath

Get path to database of the methylDB objects
getMethylDiff

get differentially methylated regions/bases based on cutoffs
getTargetAnnotationStats

Get the percentage of target features overlapping with annotation from annotationByFeature
getTreatment

Get or Set treatment vector of methylKit object
getMembers

Get the membership slot of annotationByFeature
getSampleID

Get or Set Sample-IDs of the methylKit objects
makeMethylDB

coerce methylKit objects from memory to flat file database objects
getFlanks

Get upstream and downstream adjacent regions to a genomic feature
methylDiffDB-class

An S4 class that holds differential methylation information as flat file database
methylRaw-class

An S4 class for holding raw methylation data from an alignment pipeline.
methRead

read file(s) to methylRaw or methylRawList objects
methSeg

Segment methylation or differential methylation profile
methSeg2bed

Export segments to BED files
methylBase.obj

Example methylBase object.
methylDiff-class

An S4 class that holds differential methylation information
methylBaseDB-class

An S4 class for storing methylation events sampled in multiple experiments as flat file database
methylBase-class

An S4 class for methylation events sampled in multiple experiments
methylDiff.obj

Example methylKit objects.
methylRawDB-class

An S4 class for storing raw methylation data as flat file database.
methylRawList-class

An S4 class for holding a list of methylRaw objects.
percMethylation

get percent methylation scores from methylBase or methylBaseDB object
methylRawList.obj

Example methylRawList object.
methylRawListDB-class

An S4 class for holding a list of methylRawDB objects.
plotTargetAnnotation

Plot annotation categories from annotationByGenicParts or annotationByFeature
pool

Pool replicates within groups to a single sample per group
processBismarkAln

Get methylation percentage from sorted Bismark alignments
normalizeCoverage

normalize read coverage between samples
PCASamples

Principal Components Analysis of Methylation data
reorganize

Reorganize methylKit objects by creating new objects from subset of samples
readTranscriptFeatures

Read transcript features from a BED file
selectByOverlap

selects records of methylDB objects lying inside a GRanges range
regionCounts

Get regional counts for given GRanges or GRangesList object
removeComp

Remove principal components from a methylBase object
show,methylBase-method

show method for methylKit classes
readBed

read a bed file and convert it to GRanges
readFeatureFlank

a function to read-in genomic features and their upstream and downstream adjecent regions such as CpG islands and their shores
reconstruct

Reconstruct methylBase or methylBaseDB object based on a new methylation percentage matrix
select

selects rows from of methylKit objects
tileMethylCounts

Get methylated/unmethylated base counts for tilling windows
unite

unite methylRawList to a single table
updateMethObject

update methylKit objects The method updates object from earlier versions (<v0.9.1) to latest object.
assocComp

Associate principal components with sample annotations