read_david_data_file: read_david_data_file

Description

Parses a text file output by the DAVID web application (https://david.ncifcrf.gov/) (see details).

Usage

read_david_data_file(file, output = "flat", redundant = FALSE, sep = "\t")

Value

The function returns either a data.frame containing DAVID data, a list of data.frames, or a list of gene sets. (see documentation for the output parameter above).

Arguments

file

A file path pointing to a DAVID "Functional Annotation Cluster" or "Functional Annotation Chart" text file.

output

(optional) Specifies the type of output. (default "flat") This parameter can take one of three values:

"flat":: If "flat" is specified, a single data.frame containing the standard DAVID output fields is returned. For "Functional Annnotation Cluster" data, an additional column named `Cluster (ES)` is included, containing for each gene set, comma-separated DAVID `Annotation Cluster` assignments and in parentheses, DAVID Enrichment Scores.

"hierarchic":

For "hierarchic" output, a list containing a set of data.frames for each `Annotation Cluster` is returned. This only works with "Functional Annotation Cluster" output.

"GSC":

DAVID data sets contain nested gene sets in their `Genes` column. The gene sets can be extracted as a list of gene set vectors by specifying this option.

redundant

(optional) The "Functional Annotation Cluster" output of DAVID contains fuzzy DAVID clusters in which a given gene set may be assigned to multiple clusters. As a result, some gene sets can have multiple lines in a "Functional Annotation Cluster" output file, resulting in redundant data.frame rows. If this value is FALSE, the returned "flat" data.frame will have gene set duplicates removed and the DAVID `Annotation Cluster` identities of each gene set listed as comma separated values in the `Cluster (ES)` column. If TRUE than the redundancies are tolerated and replicate gene set rows are not collapsed. (default: FALSE)

sep

(optional) Specifies the separator used in the DAVID output file. This probably does not need to be specified. (default "\t")

Details

This function parses tab-separated text files from the DAVID web application (https://david.ncifcrf.gov/). Two variants of DAVID output are supported, specifically the data format generated by selecting "Functional Annotation Chart" or "Functional Annotation Cluster" and downloading the resulting data as a text file.

The parser expects the following fields in the data: "Category", "Term", "Count", "%", "PValue", "Genes", "List Total", "Pop Hits", "Pop Total", "Fold Enrichment", "Bonferroni", "Benjamini", and "FDR".

To create a data.frame suitable for use with gsnAddPathwaysData(), the default options are required, particularly output = "flat" and redundant = FALSE.

Description

Usage

Value

Arguments

Details

See Also