breastdata: Breast cancer gene expression + DNA copy number data set from Chin
et. al. and used in Witten, et. al. See references below.
Description
This data set consists of gene expression and DNA copy number
measurements on a set of 89 samples. The data set can be used to
perform integrative analysis of gene expression and DNA copy number
data, as in . That is,
we can look for sets of genes that are associated with regions of
chromosomal gain/loss.
Must be downloaded from https://statweb.stanford.edu/~tibs/PMA/
and put into the folder PMA/data
Arguments
Format
The format is a list containing the following elements: - dna: a
2149x89 matrix of CGH spots x Samples - rna: a 19672x89 matrix of Genes x
Samples - chrom: a 2149-vector of chromosomal location of each CGH spot -
nuc: a 2149-vector of nucleotide position for each CGH spot - gene: a
19672-vector wiith an accession number for each gene - genenames: a
19672-vector with a name for each gene - genechr: a 19672-vector with a
chromosomal location for each gene - genedesc: a 19672-vector with a
description for each gene - genepos: a 19672-vector with a nucleotide
position for each gene
Details
Missing values were imputed using 5-nearest neighbors (see library
pamr).
References
Chin K., et. al. (2006) tools:::Rd_expr_doi("10.1016/j.ccr.2006.10.009").
Witten D. M., Tibshirani R., and Hastie, T. (2009) tools:::Rd_expr_doi("10.1093/biostatistics/kxp008").