Learn R Programming

TCGA2STAT (version 1.0)

getTCGA: Get TCGA Data.

Description

Obtain TCGA data from the Broad GDAC Firehose and process the data into a format ready for statistical analysis.

Usage

getTCGA(disease = "GBM", data.type = "RNASeq2", type = "", filter = "Y",  
		p = getOption("mc.cores", 2L), clinical = TRUE, cvars = "OS")

Arguments

disease
abbreviation for cancer type; default to "GBM" for glioblastoma multiforme .
data.type
genomic data profiling platform; default to "RNASeq2" for gene level RNA-Seq data from the second pipeline (RNASeqV2).
type
specific data type produced by certain platforms.
filter
chromosome to be filtered out during data import; only applicable to data.type="CNV_SNP".
p
maximum number of processing cores used in parallel processing; default to the value set in "mc.cores" global option or 2.
clinical
logical value to indicate if clinical data is to be imported; default to TRUE.
cvars
clinical covariates to be merged with genomic data; default to "OS" for overall survival.

Value

  • If clinical data is imported, a list containing three elements is returned:
  • data matrix or a list of matrices. Each matrix is of dimension gene x sample.
  • clinicala matrix of dimension sample x clinical covariates.
  • merged.data matrix or a list of matrices. Each matrix is the merged dat and clinical data as specified by cvars. Thus, each matrix of size sample x (cvars + gene)
  • If clinical data is not imported, only a matrix or list of matrices with genomics data, each of dimension gene x sample is returned.

    Note: for methylation data, each row in dat is a probe for CpG island, the first three columns are gene symbol, chromosome, and genome coordinate.

code

CNTools

Details

Values for disease include "ACC", "BLCA", "BRCA", "CESC", "CHOL", "COAD", "COADREAD", "DLBC", "ESCA", "FPPP", "GBM", "GBMLGG", "HNSC", "KICH", "KIPAN", "KIRC", "KIRP", "LAML", "LGG", "LIHC", "LUAD", "LUSC", "MESO", "OV", "PAAD", "PCPG", "PRAD", "READ", "SARC", "SKCM", "STAD", "TGCT", "THCA", "THYM", "UCEC", "UCS", and "UVM". Values for data.type include "RNASeq2", "RNASeq", "miRNASeq", "CNA_SNP", "CNV_SNP", "CNA_CGH", "Methylation", "Mutation", "mRNA_Array", and "miRNA_Array". Note that not all combinations are permitted; Appendix A of the package vignette outlines all values of disease and data.type accommodated by TCGA2STAT.

The type parameter should only be used along with these four data.type parameters:

  • RNASeq- "raw_counts" for raw read counts (default); "RPKM" for normalized read counts (reads per kilobase per million mapped reads).
miRNASeq - "raw_counts" for raw read counts (default); "reads_per_million_miRNA_mapped" for normalized read counts. Mutation - "somatic" for non-silent somatic mutations (default); "all" for all mutations. Methylation - "27K" platform (default), "450K" platform, and"all" for both platforms.

Examples

Run this code
library(TCGA2STAT)
ov.rnaseq2 <- getTCGA(disease="OV", data.type="RNASeq2")
ov.rnaseq <- getTCGA(disease="OV", data.type="RNASeq", type="RPKM")

Run the code above in your browser using DataLab