CAGEset
object. After the CAGEset
object has been created the data can be further manipulated and visualized using other functions available in the CAGEr package and integrated with other analyses in R. Available resources include:
- FANTOM5 datasets (Forrest et al. Nature 2014) for numerous human and mouse samples (primary cells, cell lines and tissues), which are fetched directly from FANTOM5 online resource.
- FANTOM3 and 4 datasets (Carninci et al. Science 2005, Faulkner et al. Nature Genetics 2009, Suzuki et al. Nature Genetics 2009) from FANTOM3and4CAGE data package available from Bioconductor
- ENCODE datasets (Djebali et al. Nature 2012) for numerous human cell lines from ENCODEprojectCAGE data package, which is available for download from http://promshift.genereg.net/CAGEr/.
- Zebrafish developmental timecourse datasets (Nepal et al. Genome Research 2013) from ZebrafishDevelopmentalCAGE data package, which is available for download from http://promshift.genereg.net/CAGEr/.
importPublicData(source, dataset, group, sample)
"FANTOM5"
: for fetching and importing CAGE data for various human or mouse primary cells, cell lines and tissues from the online FANTOM5 resource (http://fantom.gsc.riken.jp/5/data/). All data published in main FANTOM5 publication by Forrest et al. is available.
"FANTOM3and4"
: for importing CAGE data for various human or mouse tissues produced within FANTOM3 and FANTOM4 projects. Requires data package FANTOM3and4CAGE to be installed. This data package is available from Bioconductor.
"ENCODE"
: for importing CAGE data for human cell lines from ENCODE project published by Djebali et al.. Requires data package ENCODEprojectCAGE to be installed. This data package is available for download from http://promshift.genereg.net/CAGEr/.
"ZebrafishDevelopment"
: for importing CAGE data from developmental timecourse of zebrafish (Danio rerio) published by Nepal et al.. Requires data package ZebrafishDevelopmentalCAGE to be installed. This data package is available for download from http://promshift.genereg.net/CAGEr/.
See Details for further explanation of individual resources.
FANTOM5
it can be either "human"
or "mouse"
, and only one of them can be specified at a time. For other resources please refer to the vignette of the corresponding data package for the list of available datasets. Multiple datasets mapped to the same genome can be specified to combine selected samples from each.
group
argument is used only when importing TSSs from data packages and ignored when source="FANTOM5"
. For available groups in each dataset please refer to the vignette of the corresponding data package. Either only one group has to be specified (if all selected samples belong to the same group) or one group per sample (if samples belong to different groups). In the latter case, the number of elements in group
must match the number of elements in sample
.
data(FANTOM5humanSamples)
and data(FANTOM5mouseSamples)
, respectively. Use the names from the sample
column to specify which samples should be imported.
CAGEset
object is returned. Slots librarySizes
, CTSScoordinates
and tagCountMatrix
are occupied by the single base-pair resolution TSS data imported from the selected resource.
CAGEset
object for further manipulation with CAGEr.
FANTOM consortium provides single base-pair resolution TSS data for numerous human and mouse primary cells, cell lines and tissues produced within FANTOM5 project (Forrest et al. Nature 2014). These are directly fetched from their online resource at http://fantom.gsc.riken.jp/5/data and imported into a CAGEset
object. To use this resource specify source="FANTOM5"
. The dataset
argument can be either "human"
or "mouse"
, but not both at the same time. The list of all human and mouse samples can be obtained by loading data(FANTOM5humanSamples)
and data(FANTOM5mouseSamples)
. The sample
column gives the names of individual samples that should be provided as sample
argument. See example below.
TSS data from previous FANTOM3 and FANTOM4 projects (Carninci et al., Faulkner et al., Suzuki et al.) are also available through FANTOM3and4CAGE data package. This data package can be installed directly from Bioconductor. To use this resource install and load FANTOM3and4CAGE package and specify source="FANTOM3and4"
. The dataset
argument can be a name of any of the datasets available in this package. Load data(FANTOMhumanSamples)
or data(FANTOMmouseSamples)
for the list of available datasets with group and sample labels for specific human or mouse samples. These have to be provided as dataset
, group
and sample
arguments to import selected samples. If all samples belong to the same group, only this one group can be provided, otherwise, for each sample a corresponding group has to be specified, i.e. the number of elements in group
must match the numer of elements in sample
.
ENCODE consortium produced CAGE data for numerous human cell lines (Djebali et al. Nature 2012). We have used these data to derive single base-pair resolution TSSs and collected them into an R data package ENCODEprojectCAGE. This data package is available for download from http://promshift.genereg.net/CAGEr/. To use this resource install and load ENCODEprojectCAGE data package and specify source="ENCODE"
. The dataset
argument can be a name of any of the datasets available in this package. Load data(ENCODEhumanCellLinesSamples)
for the list of available datasets with group and sample labels for specific samples. These have to be provided as dataset
, group
and sample
arguments to import selected samples. Multiple datasets can be combined together, by specifying them as dataset
argument. If all samples belong to the same dataset and the same group, these dataset and group can be specified only once, otherwise, for each sample a corresponding dataset and group have to be specified, i.e. the number of elements in dataset
and group
must match the numer of elements in sample
.
Precise TSSs are also available for zebrafish (Danio Rerio) from CAGE data published by Nepal et al. for 12 developmental stages. These have been collected into a data package ZebrafishDevelopmentalCAGE, which is available for download from http://promshift.genereg.net/CAGEr/. To use this resource install and load ZebrafishDevelopmentalCAGE data package and specify source="ZebrafishDevelopment"
. Load data(ZebrafishSamples)
for the list of available datasets and group and sample labels, which have to be specified to import these data.
getCTSS
### importing FANTOM5 data
# list of FANTOM5 human tissue samples
data(FANTOM5humanSamples)
head(subset(FANTOM5humanSamples, type == "tissue"))
# import selected samples
exampleCAGEset <- importPublicData(source="FANTOM5", dataset = "human", sample = c("adipose_tissue__adult__pool1", "adrenal_gland__adult__pool1", "aorta__adult__pool1"))
exampleCAGEset
### importing FANTOM3/4 data from a data package
library(FANTOM3and4CAGE)
# list of mouse datasets available in this package
data(FANTOMmouseSamples)
unique(FANTOMmouseSamples$dataset)
head(subset(FANTOMmouseSamples, dataset == "FANTOMtissueCAGEmouse"))
head(subset(FANTOMmouseSamples, dataset == "FANTOMtimecourseCAGEmouse"))
# import selected samples from two different mouse datasets
exampleCAGEset <- importPublicData(source="FANTOM3and4", dataset = c("FANTOMtissueCAGEmouse", "FANTOMtimecourseCAGEmouse"), group = c("brain", "adipogenic_induction"), sample = c("CCL-131_Neuro-2a_treatment_for_6hr_with_MPP+", "DFAT-D1_preadipocytes_2days"))
exampleCAGEset
Run the code above in your browser using DataLab