Parses a text file output by the DAVID web application (https://david.ncifcrf.gov/) (see details).
read_david_data_file(file, output = "flat", redundant = FALSE, sep = "\t")
The function returns either a data.frame containing DAVID data, a list of data.frames, or a list of gene sets.
(see documentation for the output
parameter above).
A file path pointing to a DAVID "Functional Annotation Cluster" or "Functional Annotation Chart" text file.
(optional) Specifies the type of output. (default "flat") This parameter can take one of three values:
If "flat" is specified, a single data.frame containing the standard DAVID output fields is returned.
For "Functional Annnotation Cluster" data, an additional column named `Cluster (ES)`
is included,
containing for each gene set, comma-separated DAVID `Annotation Cluster`
assignments and in parentheses,
DAVID Enrichment Scores.
For "hierarchic" output, a list containing a set of data.frames for each `Annotation Cluster`
is
returned. This only works with "Functional Annotation Cluster" output.
DAVID data sets contain nested gene sets in their `Genes`
column. The gene sets can be extracted as a
list of gene set vectors by specifying this option.
(optional) The "Functional Annotation Cluster" output of DAVID contains fuzzy DAVID clusters in which
a given gene set may be assigned to multiple clusters. As a result, some gene sets can have multiple lines in a
"Functional Annotation Cluster" output file, resulting in redundant data.frame rows. If this value is FALSE
,
the returned "flat" data.frame will have gene set duplicates removed and the DAVID `Annotation Cluster`
identities
of each gene set listed as comma separated values in the `Cluster (ES)`
column. If TRUE
than the redundancies
are tolerated and replicate gene set rows are not collapsed. (default: FALSE
)
(optional) Specifies the separator used in the DAVID output file. This probably does not need to be specified. (default "\t")
This function parses tab-separated text files from the DAVID web application (https://david.ncifcrf.gov/). Two variants of DAVID output are supported, specifically the data format generated by selecting "Functional Annotation Chart" or "Functional Annotation Cluster" and downloading the resulting data as a text file.
The parser expects the following fields in the data: "Category", "Term", "Count", "%", "PValue", "Genes", "List Total", "Pop Hits", "Pop Total", "Fold Enrichment", "Bonferroni", "Benjamini", and "FDR".
To create a data.frame suitable for use with gsnAddPathwaysData()
, the default options are required,
particularly output = "flat"
and redundant = FALSE
.
gsnImportDAVID()