Learn R Programming

dataverse (version 0.3.0)

get_file: Download dataverse file as a raw binary

Description

Download Dataverse File(s). get_file_* functions return a raw binary file, which cannot be readily analyzed in R. To use the objects as dataframes, see the get_dataset_* functions at get_dataset instead.

Usage

get_file(
  file,
  dataset = NULL,
  format = c("original", "bundle"),
  vars = NULL,
  key = Sys.getenv("DATAVERSE_KEY"),
  server = Sys.getenv("DATAVERSE_SERVER"),
  original = TRUE,
  ...
)

get_file_by_name( filename, dataset, format = c("original", "bundle"), vars = NULL, key = Sys.getenv("DATAVERSE_KEY"), server = Sys.getenv("DATAVERSE_SERVER"), original = TRUE, ... )

get_file_by_id( fileid, dataset = NULL, format = c("original", "bundle"), vars = NULL, original = TRUE, key = Sys.getenv("DATAVERSE_KEY"), server = Sys.getenv("DATAVERSE_SERVER"), ... )

get_file_by_doi( filedoi, dataset = NULL, format = c("original", "bundle"), vars = NULL, original = TRUE, key = Sys.getenv("DATAVERSE_KEY"), server = Sys.getenv("DATAVERSE_SERVER"), ... )

Arguments

file

An integer specifying a file identifier; or a vector of integers specifying file identifiers; or, if used with the prefix "doi:", a character with the file-specific DOI; or, if used without the prefix, a filename accompanied by a dataset DOI in the dataset argument, or an object of class “dataverse_file” as returned by dataset_files.

dataset

A character specifying a persistent identification ID for a dataset, for example "doi:10.70122/FK2/HXJVJU". Alternatively, an object of class “dataverse_dataset” obtained by dataverse_contents().

format

A character string specifying a file format for download. by default, this is “original” (the original file format). If NULL, no query is added, so ingested files are returned in their ingested TSV form. For tabular datasets, the option “bundle” downloads the bundle of the original and archival versions, as well as the documentation. See https://guides.dataverse.org/en/latest/api/dataaccess.html for details.

vars

A character vector specifying one or more variable names, used to extract a subset of the data.

key

A character string specifying a Dataverse server API key. If one is not specified, functions calling authenticated API endpoints will fail. Keys can be specified atomically or globally using Sys.setenv("DATAVERSE_KEY" = "examplekey").

server

A character string specifying a Dataverse server. There are multiple Dataverse installations, but the defaults is to use the Harvard Dataverse (server = "dataverse.harvard.edu"). This can be modified atomically or globally using Sys.setenv("DATAVERSE_SERVER" = "dataverse.example.com").

original

A logical, defaulting to TRUE. If a ingested (.tab) version is available, download the original version instead of the ingested? If there was no ingested version, is set to NA. Note in get_dataframe_*, original is set to FALSE by default. Either can be changed.

...

Additional arguments passed to an HTTP request function, such as GET, POST, or DELETE.

filename

Filename of the dataset, with file extension as shown in Dataverse (for example, if nlsw88.dta was the original but is displayed as the ingested nlsw88.tab, use the ingested version.)

fileid

A numeric ID internally used for get_file_by_id

filedoi

A DOI for a single file (not the entire dataset), of the form "10.70122/FK2/PPIAXE/MHDB0O" or "doi:10.70122/FK2/PPIAXE/MHDB0O"

Value

get_file returns a raw vector (or list of raw vectors, if length(file) > 1), which can be saved locally with the writeBin function. To load datasets into the R environment dataframe, see get_dataframe_by_name.

Details

This function provides access to data files from a Dataverse entry. get_file is a general wrapper, and can take either dataverse objects, file IDs, or a filename and dataverse. Internally, all functions download each file by get_file_by_id. get_file_by_name is a shorthand for running get_file by specifying a file name (filename) and dataset (dataset). get_file_by_doi obtains a file by its file DOI, bypassing the dataset argument.

See Also

To load the objects as datasets get_dataframe_by_name.

Examples

Run this code
# NOT RUN {
# 1. Using filename and dataverse
f1 <- get_file_by_name(
  filename = "nlsw88.tab",
  dataset  = "10.70122/FK2/PPIAXE",
  server   = "demo.dataverse.org"
)

# 2. Using file DOI
f2 <- get_file_by_doi(
  filedoi  = "10.70122/FK2/PPIAXE/MHDB0O",
  server   = "demo.dataverse.org"
)

# 3. Two-steps: Find ID from get_dataset
d3 <- get_dataset("doi:10.70122/FK2/PPIAXE", server = "demo.dataverse.org")
f3 <- get_file(d3$files$id[1], server = "demo.dataverse.org")

# 4. Retrieve multiple raw data in list
f4_vec <- get_dataset(
  "doi:10.70122/FK2/PPIAXE",
  server = "demo.dataverse.org"
)$files$id

f4 <- get_file(f4_vec, server = "demo.dataverse.org")
length(f4)

# Write binary files
# (see `get_dataframe_by_name` to load in environment)
# The appropriate file extension needs to be assigned by the user.
writeBin(f1, "nlsw88.dta")
writeBin(f2, "nlsw88.dta")

writeBin(f4[[1]], "nlsw88.rds") # originally a rds file
writeBin(f4[[2]], "nlsw88.dta") # originally a dta file
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab