Learn R Programming

goat (version 1.1.2)

download_genesets_goatrepo: Download and parse geneset collections from the GOAT GitHub repository

Description

while the Bioconductor respository is extensive, contains data for many species and is a part of a larger infrastructure, it might contain outdated GO data when the user is not using the latest R version. If users are on an R version that is a few years old, so will the GO data from Bioconductor be.

As an alternative, we store gene2go data from NCBI (for Human genes only!) at the GOAT GitHub repository. This function allows for a convenient way to download this data and then parse the genesets.

Alternatively you can browse the files in the data branch of the GOAT GitHub repository and download these files manually, then load them via the GOAT R function load_genesets_go_fromfile(). To view these data, open this URL in a browser; https://github.com/ftwkoopmans/goat/tree/data You can also use this R package to see all available data via available_genesets_goatrepo()

By default (empty version parameter), this function will first check the online GOAT GitHub repository to find the most recent version/date, then download the respective data.

Usage

download_genesets_goatrepo(
  output_dir,
  type = "GO",
  version = "",
  ignore_cache = FALSE
)

Value

result from respective geneset parser function. e.g. if parameter type was set to"GO" (default), this function returns the result of load_genesets_go_fromfile(). These data returned by this function is typically used as input for filter_genesets(), c.f. full example at documentation for test_genesets()

Arguments

output_dir

full path to the directory where the downloaded files should be stored. Directory is created if it does not exist. e.g. output_dir="~/data" on unix systems, output_dir="C:/data" on Windows, or set to output_dir=getwd() to write output to the current working directory

type

the type of genesets to download. Currently, only "GO" is supported (default)

version

the dataset version. This must be a date in format YYYY-MM-DD (for example; "2024-01-01") OR be left empty (NA or empty string, the default) to automatically download the latest version.

ignore_cache

boolean, set to TRUE to force re-download and ignore cached data, if any. Default: FALSE

Examples

Run this code
# \donttest{
# note: this example will download 2 files of approx 10MB in total

# store the downloaded files in the following directory. Here, the temporary file
# directory is used. Alternatively, consider storing this data in a more permanent location.
# e.g. output_dir="~/data/go" on unix systems or output_dir="C:/data/go" on Windows
output_dir = tempdir()

# download data files with GO annotations (note that the release/date is printed to console)
# these are then parsed with the load_genesets_go_fromfile() function
# if the files are already available at output_dir, these are used and download is skipped
genesets_asis = download_genesets_goatrepo(output_dir)

### for a basic example on how to use the data obtain here,
### refer to the example included at function documentation of: test_genesets()
# }

Run the code above in your browser using DataLab