pdat_: Get Protein Data

Description

Get data on protein expression and chemical composition.

Usage

pdat_CRC(dataset = NULL, basis = "QEC")

Arguments

dataset

character, specifies which dataset to retrieve

basis

character, keyword for basis species to use

Value

A list consisting of dataset (the name of the dataset), basis (basis species used for the calculations), description (descriptive text), pcomp (compositional data generated by protcomp), up2 (logical vector with length equal to the number of proteins; TRUE if the protein is up-expressed in group 2 compared to group 1 (i.e. cancer compared to normal), FALSE otherwise), names (gene names for the proteins, if available).

Details

The pdat_ functions calculate chemical compositional metrics (using protcomp) for relatively up- and down-expressed proteins reported in proteomic experiments.

Use pdat_CRC to retrieve data for protein expression in colorectal cancer, pdat_pancreactic for data on pancreatic cancer, pdat_hypoxia for data on hypoxia or 3D culture, and pdat_osmotic for data on hyperosmotic stress. The functions get relative expression data from the CSV files stored in extdata/expression/, with subdirectories corresponding to the names of the functions. Some of the functions also retrieve amino acid compositions from the files in extdata/aa/ (for non-human proteins).

If dataset is NULL, the return value gives the names of all datasets that can be retrieved using the function. Provide one of these names as the dataset argument to retrieve the data. Each dataset name indicates the study (publication) where the data were reported, constructed by combining the first characters of the (first three or four) authors' family names with the 2-digit year of publication. This coincides with the key-generation scheme used in some bibliography manager software. This abbreviation also is used to name the CSV file containing the data. If more than one dataset is available from a single study (for example, for relative protein expression in different stages of cancer), dataset is suffixed by an underscore followed by a short abbreviation indicating the particular dataset.

Tables listing mean compositional differences between up- and down-expressed proteins for each dataset are saved in extdata/summary/. These files were created using the second example below.

Examples

Run this code

# NOT RUN {
library(CHNOSZ)
pdat_CRC()
pdat_CRC("JKMF10")  # same result as get_pdat("JKMF10")

# }
# NOT RUN {
# how the extdata/summary/summary_*.csv files were made
for(what in c("CRC", "pancreatic", "hypoxia", "osmotic")) {
  pdat_fun <- paste0("pdat_", what)
  datasets <- get(pdat_fun)()
  comptab <- lapply_canprot(datasets, function(dataset) {
    pdat <- get_pdat(dataset, pdat_fun)
    get_comptab(pdat)
  }, varlist = "pdat_fun")
  # write summary table
  comptab <- do.call(rbind, comptab)
  comptab <- cbind(set = c(letters, LETTERS)[1:nrow(comptab)], comptab)
  comptab[, 6:15] <- signif(comptab[, 6:15], 4)
  filename <- paste0("summary_", what, ".csv")
  write.csv(comptab, filename, row.names = FALSE, quote = 3)
}
# }