Learn R Programming

GOexpress

Visualise microarray and RNAseq data with gene ontology annotations.

OVERVIEW

This package was designed for the analysis of bioinformatics data based on gene expression measurements. It requires two input values:

  1. an ExpressionSet containing assayData and phenoData. The assayData slot

should be a gene-by-sample matrix providing the expression level of genes (rows) in each sample (columns). Row names are expected to be either Ensembl gene identifiers or probeset identifiers present in microarrays present in the Ensembl BioMart dataset queried. The phenoData slot should be an AnnotatedDataFrame from the Biobase package providing phenotypic information about the samples. Row names are samples, at least one of the columns must be a grouping factor with two or more levels (factor in the actual meaning of the R language). 2. the name of the grouping factor to investigate, which must be a valid column name in the phenoData.

The analysis scores all Gene Ontology (GO) terms represented in the gene annotations provided, or semi-automatically retrieved from the current Ensembl annotation release, using the biomaRt package. In the default approach, the random forest framework is used to evaluate the ability of each gene feature in the ExpressionSet to cluster groups of samples according to a known experimental factor. Notably, genes associated with the GO term in the annotations but absent from the dataset are assigned a score of 0 and a rank equal to the number of gene features in the ExpressionSet plus one. GO terms are scored and ranked on the average rank (alternatively, score) of all associated genes (including those absent from the ExpressionSet).

Functions are provided to investigate and visualise the results of the above analysis. The score table can be filtered for GO terms passing given thresholds. The distribution of scores can be visualised. The quantiles of scores can be obtained. The genes associated with a given GO term can be listed, with or without descriptive information. Hierarchical clustering of the samples can be performed based on the expression levels of genes associated with a given GO term. Heatmaps accompanied by hierarchical clustering of samples and genes can be drawn. The expression profile of genes can be plotted against any factor while grouping samples on another factor. The univariate effect of all factors can be visualised on the expression level of genes associated with a GO term. The counts of overlapping genes between multiple GO terms can be visualised in a Venn diagram. The result variable of the analysis can be re-ordered according to gene rank or score.

FEATURES

  • Support expression data based on Ensembl gene identifiers and

microarray probeset identifiers.

  • Suppots custom annotations for gene identifiers not automatically supported.

  • GO_analyse() scores all Gene Ontology (GO) terms represented in

the dataset based on the estimated average ability of their associated genes to cluster samples according to a predefined grouping factor. It also returns the table used to map genes to GO terms, the table summarising the statistics for each gene, and finally the essential parameters of the analysis performed, for reproducibility. Additional information specific to each statistical framework are included in the output object.

  • pValue_GO() computes permutation-based P-values assessing the significance

of GO term ranking, which may subsequently be used for filtering.

  • subset_scores() filters output of GO_analyse() for GO terms passing

desired filters and returns a list formatted identically to the output of GO_analyse() with the filtered information.

  • hist_scores() plots the distribution of GO term scores in the

output of GO_analyse() or subset_scores().

  • quantiles_scores() returns the quantile values corresponding

to defined percentiles.

  • list_genes() returns the list of feature identifiers associated

with a given GO term.

  • table_genes() returns a table of information about the feature

identifiers associated with a given GO term.

  • cluster_GO() plots a hierarchical clustering of the samples

based on the expression levels of genes associated with a given GO term.

  • heatmap_GO() plots a heatmap with hierarchical clustering of the samples

and genes based on the expression levels of genes associated with a given GO term.

  • expression_plot() plots the expression profile corresponding to a feature

identifier, given valid variable name for the X-axis and a grouping factor for the Y-axis.

  • expression_plot_symbol() plots the expression profile corresponding to

feature identifier(s) annotated to a gene symbol, given valid variable name for the X-axis and a grouping factor for the Y-axis.

  • expression_profiles() plots the individual expression profile of given

sample series while colouring-coding each series according to its group; a more detailed alternative to expression_plot().

  • expression_profiles_symbol() plots the individual expression profile of

given sample series while colouring-coding each series according to its group; a more detailed alternative to expression_plot_symbol().

  • plot_design() plots the univariate effect of each level of each

factor available in the phenoData on the expression levels of genes associated with a GO term.

  • overlap_GO() calls VennDiagram to plot the counts of overlapping genes

between 2-5 GO terms. This can either display to screen or print to directly to file.

  • rerank() allows to reorder the ranked tables of GO terms and

genes either by increasing (average) rank or decreasing (average) score.

  • subEset() allows to subset an ExpressionSet to only the samples with

a particular set of values in given columns of their phenotypic data (e.g. only samples from "2H" and "6H" in their "Time" information).

  • An example of the main input and output objects is included with the

package.

  • A User's Guide is included, and presents a typical workflow using the

included example input objects.

Copy Link

Version

Version

1.6.1

License

GPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

Kevin Rue-Albrecht

Last Published

February 15th, 2017

Functions in GOexpress (1.6.1)

AlvMac

Sample data from a RNAseq experiment.
cluster_GO

Generates a hierarchical clustering of the samples
expression_plot_symbol

Plots the expression profile of a gene by levels of a factor
prefix2dataset

Table mapping Ensembl gene identifier prefixes to BioMart datasets.
plot_design

Plot Univariate Effects for genes associated with a Gene Ontology
AlvMac_results.pVal

Sample output from the pValue_GO() function on an RNAseq experiment.
hist_scores

Plots the distribution of scores following an GOexpress analysis.
subset_scores

Returns a filtered list from GO_analyse results.
list_genes

Returns the genes associated with a Gene Ontology
table_genes

Returns a table listing the genes associated with a given Gene Ontology
expression_plot

Plots the expression profile of a gene by levels of a factor
pValue_GO

Compute p-values for ontologies by randomising gene labels.
quantiles_scores

Returns the quantiles of scores following an GOexpress analysis.
GO_analyse

Identifies gene ontologies clustering samples according to predefined factor.
AlvMac_allgenes

Example of custom gene feature annotations.
AlvMac_allGO

Example of custom gene ontology annotations.
GOexpress-package

Visualise microarray and RNAseq data with gene ontology annotations.
heatmap_GO

Generates a heatmap and hierarchical clustering of the samples and the genes
AlvMac_GOgenes

Example of custom mapping between gene ontology identifiers and gene features.
AlvMac_results

Sample output from the GO_analyse() function on an RNAseq experiment.
microarray2dataset

Table mapping probeset identifier prefixes to datasets in the Ensembl BioMart.
overlap_GO

Shared genes between a list of GO terms.
rerank

Reorder the result variable by alternative metrics.
subEset

Subset an ExpressionSet to values of phenotypic data columns.