Learn R Programming

ShrinkageTrees (version 1.0.0)

pdac: Processed TCGA PAAD dataset (pdac)

Description

A reduced and cleaned subset of the TCGA pancreatic ductal adenocarcinoma (PAAD) dataset, derived from The Cancer Genome Atlas (TCGA) PAAD cohort. This version, pdac, is smaller and simplified for practical analyses and package examples.

Usage

pdac

Arguments

Format

A data frame with rows corresponding to patients and columns as described above.

Details

This dataset was originally compiled and curated in the open-source pdacR package by Torre-Healy et al. (2023), which harmonized and integrated the TCGA PAAD gene expression and clinical data. The current version further reduces and simplifies the data for efficient modeling demonstrations and survival analyses.

The data frame includes:

  • time: Overall survival time in months.

  • status: Event indicator; 1 = event occurred, 0 = censored.

  • treatment: Binary treatment indicator; 1 = radiation therapy, 0 = control.

  • age: Age at initial pathologic diagnosis (numeric).

  • sex: Binary sex indicator; 1 = male, 0 = female.

  • grade: Tumor differentiation grade (ordinal; 1 = well, 2 = moderate, 3 = poor, 4 = undifferentiated).

  • tumor.cellularity: Tumor cellularity estimate (numeric).

  • tumor.purity: Tumor purity class (binary; 1 = high, 0 = low).

  • absolute.purity: Absolute purity estimate (numeric).

  • moffitt.cluster: Moffitt transcriptional subtype (binary; 1 = basal-like, 0 = classical).

  • meth.leukocyte.percent: DNA methylation leukocyte estimate (numeric).

  • meth.purity.mode: DNA methylation purity mode (numeric).

  • stage: Nodal stage indicator (binary; 1 = n1, 0 = n0).

  • lymph.nodes: Number of lymph nodes examined (numeric).

  • Driver gene columns: Expression values of key driver genes (e.g., KRAS, TP53, CDKN2A, SMAD4, BRCA1, BRCA2).

  • Other gene columns: Expression values of ~3,000 most variable non-driver genes (based on median absolute deviation).

References

  • Raphael BJ, et al. "Integrated genomic characterization of pancreatic ductal adenocarcinoma." Cancer Cell. 2017 Aug 14;32(2):185–203.e13. PMID: 28810144.

  • Torre-Healy LA, Kawalerski RR, Oh K, et al. "Open-source curation of a pancreatic ductal adenocarcinoma gene expression analysis platform (pdacR) supports a two-subtype model." Communications Biology. 2023; https://doi.org/10.1038/s42003-023-04461-6.

  • The Cancer Genome Atlas (TCGA), PAAD project, DbGaP: phs000178.