Loads specified data sets, or list the available data sets.
data(…, list = character(), package = NULL, lib.loc = NULL, verbose = getOption("verbose"), envir = .GlobalEnv)
- literal character strings or names.
- a character vector.
a character vector giving the package(s) to look
in for data sets, or
By default, all packages in the search path are used, then the
datasubdirectory (if present) of the current working directory.
- a character vector of directory names of R libraries,
NULL. The default value of
NULLcorresponds to all libraries currently known.
- a logical. If
TRUE, additional diagnostics are printed.
- the environment where the data should be loaded.
Currently, four formats of data files are supported:
- files ending
source()d in, with the R working directory changed temporarily to the directory containing the respective file. (
dataensures that the utils package is attached, in case it had been run via
- files ending
- files ending
.TXTare read using
read.table(…, header = TRUE, as.is=FALSE), and hence result in a data frame.
- files ending
.CSVare read using
read.table(…, header = TRUE, sep = ";", as.is=FALSE), and also result in a data frame.
.csvcan be compressed, with or without further extension
.xz.) The data sets to be loaded can be specified as a set of character strings or names, or as the character vector
list, or as both. For each given data set, the first two types (
.rdafiles) can create several variables in the load environment, which might all be named differently from the data set. The third and fourth types will always result in the creation of a single variable with the same name (without extension) as the data set. If no data sets are specified,
datalists the available data sets. It looks for a new-style data index in the
Metaor, if this is not found, an old-style
00Indexfile in the
datadirectory of each specified package, and uses these files to prepare a listing. If there is a
dataarea but no index, available data files for loading are computed and included in the listing, and a warning is given: such packages are incomplete. The information about available data sets is returned in an object of class
"packageIQR". The structure of this class is experimental. Where the datasets have a different name from the argument that should be used to retrieve them the index will have an entry like
beaver1 (beavers)which tells us that dataset
beaver1can be retrieved by the call
NULL(the default), the data sets are searched for in all the currently loaded packages then in the
datadirectory (if any) of the current working directory. If
lib.loc = NULLbut
packageis specified as a character vector, the specified package(s) are searched for first amongst loaded packages and then in the default library/ies (see
lib.locis specified (and not
NULL), packages are searched for in the specified library/ies, even if they are already loaded from another library. To just look in the
datadirectory of the current working directory, set
package = character(0)(and
lib.loc = NULL, the default).
A character vector of all data sets specified, or information about
all available data sets in an object of class
none were specified.
One can take advantage of the search order and the fact that a
.R file will change directory. If raw data are stored in
mydata.txt then one can set up
mydata.R to read
mydata.txt and pre-process it, e.g., using
For instance one can convert numeric vectors to factors with the
appropriate labels. Thus, the
.R file can effectively contain
a metadata specification for the plaintext formats.
data() was originally intended to allow users to load datasets
from packages for use in their examples, and as such it loaded the
datasets into the workspace
.GlobalEnv. This avoided
having large datasets in memory when not in use. That need has been
almost entirely superseded by lazy-loading of datasets. The ability to specify a dataset by name (without quotes) is a
convenience: in programming the datasets should be specified by
character strings (with quotes). Use of
data within a function without an
has the almost always undesirable side-effect of putting an object in
the user's workspace (and indeed, of replacing any object of that name
already there). It would almost always be better to put the object in
the current evaluation environment by
data(…, envir =
environment()). However, two alternatives are usually preferable,
both described in the ‘Writing R Extensions’ manual.
- For sets of data, set up a package to use lazy-loading of data.
- For objects which are system data, for example lookup tables
used in calculations within the function, use a file
R/sysdata.rdain the package sources or create the objects by R code at package installation time.
mytableas an object from the package, it is system data and the second approach should be used. In the unusual case that a package uses a lazy-loaded dataset as a default argument to a function, that needs to be specified by
help for obtaining documentation on data sets,
save for creating the second (
of data, typically the most efficient one. The ‘Writing R Extensions’ for considerations in preparing the
data directory of a package.
require(utils) data() # list all available data sets try(data(package = "rpart") ) # list the data sets in the rpart package data(USArrests, "VADeaths") # load the data sets 'USArrests' and 'VADeaths' ## Not run: ------------------------------------ # ## Alternatively # ds <- c("USArrests", "VADeaths"); data(list = ds) ## --------------------------------------------- help(USArrests) # give information on data set 'USArrests'