Learn R Programming

dataverse (version 0.3.0)

get_dataframe_by_name: Download dataverse file as a dataframe

Description

Use get_dataframe_by_name if you know the name of the datafile and the DOI of the dataset. Use get_dataframe_by_doi if you know the DOI of the datafile itself. Use get_dataframe_by_id if you know the numeric ID of the datafile.

Usage

get_dataframe_by_name(
  filename,
  dataset = NULL,
  .f = NULL,
  original = FALSE,
  ...
)

get_dataframe_by_id(fileid, .f = NULL, original = FALSE, ...)

get_dataframe_by_doi(filedoi, .f = NULL, original = FALSE, ...)

Arguments

filename

The name of the file of interest, with file extension, for example "roster-bulls-1996.tab".

dataset

A character specifying a persistent identification ID for a dataset, for example "doi:10.70122/FK2/HXJVJU". Alternatively, an object of class “dataverse_dataset” obtained by dataverse_contents().

.f

The function to used for reading in the raw dataset. This user must choose the appropriate function: for example if the target is a .rds file, then .f should be readRDS or readr::read_rds.

original

A logical, defaulting to TRUE. Whether to read the ingested, archival version of the datafile if one exists. The archival versions are tab-delimited .tab files so if original = FALSE, .f is set to readr::read_tsv. If functions to read the original version is available, then original = TRUE with a specified .f is better.

...

Arguments passed on to get_file

file

An integer specifying a file identifier; or a vector of integers specifying file identifiers; or, if used with the prefix "doi:", a character with the file-specific DOI; or, if used without the prefix, a filename accompanied by a dataset DOI in the dataset argument, or an object of class “dataverse_file” as returned by dataset_files.

format

A character string specifying a file format for download. by default, this is “original” (the original file format). If NULL, no query is added, so ingested files are returned in their ingested TSV form. For tabular datasets, the option “bundle” downloads the bundle of the original and archival versions, as well as the documentation. See https://guides.dataverse.org/en/latest/api/dataaccess.html for details.

vars

A character vector specifying one or more variable names, used to extract a subset of the data.

key

A character string specifying a Dataverse server API key. If one is not specified, functions calling authenticated API endpoints will fail. Keys can be specified atomically or globally using Sys.setenv("DATAVERSE_KEY" = "examplekey").

server

A character string specifying a Dataverse server. There are multiple Dataverse installations, but the defaults is to use the Harvard Dataverse (server = "dataverse.harvard.edu"). This can be modified atomically or globally using Sys.setenv("DATAVERSE_SERVER" = "dataverse.example.com").

fileid

A numeric ID internally used for get_file_by_id

filedoi

A DOI for a single file (not the entire dataset), of the form "10.70122/FK2/PPIAXE/MHDB0O" or "doi:10.70122/FK2/PPIAXE/MHDB0O"

Examples

Run this code
# NOT RUN {
# Retrieve data.frame from dataverse DOI and file name
df_tab <-
  get_dataframe_by_name(
    filename = "roster-bulls-1996.tab",
    dataset  = "doi:10.70122/FK2/HXJVJU",
    server   = "demo.dataverse.org"
  )

# Retrieve the same file from file DOI
df_tab <-
  get_dataframe_by_doi(
    filedoi      = "10.70122/FK2/HXJVJU/SA3Z2V",
    server       = "demo.dataverse.org"
  )

# Do not run when submitting to CRAN, because the whole
# example sometimes takes longer than 10 sec.
# }
# NOT RUN {
# Retrieve ingested file originally a Stata dta
df_from_stata_ingested <-
  get_dataframe_by_name(
    filename   = "nlsw88.tab",
    dataset    = "doi:10.70122/FK2/PPIAXE",
    server     = "demo.dataverse.org"
  )

# To use the original file version, or for non-ingested data,
# please specify `original = TRUE` and specify a function in .f.

if (requireNamespace("readr", quietly = TRUE)) {
  df_from_rds_original <-
    get_dataframe_by_name(
      filename   = "nlsw88_rds-export.rds",
      dataset    = "doi:10.70122/FK2/PPIAXE",
      server     = "demo.dataverse.org",
      original   = TRUE,
      .f         = readr::read_rds
    )
}

# Get Stata file as original
if (requireNamespace("haven", quietly = TRUE)) {
  df_stata_original <-
    get_dataframe_by_name(
      filename   = "nlsw88.tab",
      dataset    = "doi:10.70122/FK2/PPIAXE",
      server     = "demo.dataverse.org",
      original   = TRUE,
      .f         = haven::read_dta
    )
}

# Stata file as ingested file (less information than original)
df_stata_ingested <-
  get_dataframe_by_name(
    filename   = "nlsw88.tab",
    dataset    = "doi:10.70122/FK2/PPIAXE",
    server     = "demo.dataverse.org"
 )

# }
# NOT RUN {
# }

Run the code above in your browser using DataLab