Get data on protein expression and chemical composition.
pdat_CRC(dataset = NULL, basis = "QEC")
character, specifies which dataset to retrieve
character, keyword for basis species to use
A list consisting of dataset
(the name of the dataset), basis
(basis species used for the calculations), description
(descriptive text), pcomp
(compositional data generated by protcomp
), up2
(logical vector with length equal to the number of proteins; TRUE if the protein is up-expressed in group 2 compared to group 1 (i.e. cancer compared to normal), FALSE otherwise), names
(gene names for the proteins, if available).
The pdat_
functions calculate chemical compositional metrics (using protcomp
) for relatively up- and down-expressed proteins reported in proteomic experiments.
Use pdat_CRC
to retrieve data for protein expression in colorectal cancer, pdat_pancreactic
for data on pancreatic cancer, pdat_hypoxia
for data on hypoxia or 3D culture, and pdat_osmotic
for data on hyperosmotic stress.
The functions get relative expression data from the CSV files stored in extdata/expression/
, with subdirectories corresponding to the names of the functions.
Some of the functions also retrieve amino acid compositions from the files in extdata/aa/
(for non-human proteins).
If dataset
is NULL
, the return value gives the names of all datasets that can be retrieved using the function.
Provide one of these names as the dataset
argument to retrieve the data.
Each dataset name indicates the study (publication) where the data were reported, constructed by combining the first characters of the (first three or four) authors' family names with the 2-digit year of publication.
This coincides with the key-generation scheme used in some bibliography manager software.
This abbreviation also is used to name the CSV file containing the data.
If more than one dataset is available from a single study (for example, for relative protein expression in different stages of cancer), dataset
is suffixed by an underscore followed by a short abbreviation indicating the particular dataset.
Tables listing mean compositional differences between up- and down-expressed proteins for each dataset are saved in extdata/summary/
.
These files were created using the second example below.
# NOT RUN {
library(CHNOSZ)
pdat_CRC()
pdat_CRC("JKMF10") # same result as get_pdat("JKMF10")
# }
# NOT RUN {
# how the extdata/summary/summary_*.csv files were made
for(what in c("CRC", "pancreatic", "hypoxia", "osmotic")) {
pdat_fun <- paste0("pdat_", what)
datasets <- get(pdat_fun)()
comptab <- lapply_canprot(datasets, function(dataset) {
pdat <- get_pdat(dataset, pdat_fun)
get_comptab(pdat)
}, varlist = "pdat_fun")
# write summary table
comptab <- do.call(rbind, comptab)
comptab <- cbind(set = c(letters, LETTERS)[1:nrow(comptab)], comptab)
comptab[, 6:15] <- signif(comptab[, 6:15], 4)
filename <- paste0("summary_", what, ".csv")
write.csv(comptab, filename, row.names = FALSE, quote = 3)
}
# }
Run the code above in your browser using DataLab