Learn R Programming

⚠️There's a newer version (2.2.2) of this package.Take me there.

dataone: R interface to the DataONE network of data repositories

Provides read and write access to data and metadata from the DataONE network of data repositories, including the KNB Data Repository, Dryad, and the NSF Arctic Data Center. Each DataONE repository implements a consistent repository application programming interface. Users call methods in R to access these remote repository functions, such as methods to query the metadata catalog, get access to metadata for particular data packages, and read the data objects from the data repository using the global identifier for each data object. Users can also insert and update data objects on repositories that support these methods. For more details, see the vignettes.

Installation Notes

Version 2.0 of the dataone R package removes the dependency on rJava and significantly changes the base API to correspond to the published DataONE API. Previous methods for accessing DataONE will be maintained, but new methods have been added.

The dataone R package requires the R package redland. If you are installing on Ubuntu then the Redland C libraries must be installed first. If you are installing on Mac OS X or Windows then installing these libraries is not required.

Installing on Mac OS X

On Mac OS X dataone can be installed with the following commands:

install.packages("dataone")
library(dataone)

The dataone R package should be available for use at this point.

Installing on Ubuntu

For ubuntu, install the required Redland C libraries by entering the following commands in a terminal window:

sudo apt-get update
sudo apt-get install librdf0 librdf0-dev

Then install the R packages from the R console:

install.packages("dataone")
library(dataone)

The dataone R package should be available for use at this point

Installing on Windows

For windows, the required redland R package is distributed as a binary release, so it is not necessary to install any additional system libraries.

To install the dataone R packages from the R console:

install.packages("dataone")
library(dataone)

The dataone R package should be available for use at this point.

Quick Start

See the full manual (help(dataone)) for documentation.

To search the DataONE Federation Member Node Knowledge Network for Biocomplexity (KNB) for a dataset:

library(dataone)
cn <- CNode("PROD")
mn <- getMNode(cn, "urn:node:KNB")
mySearchTerms <- list(q="abstract:salmon+AND+keywords:spawn+AND+keywords:chinook",
                      fl="id,title,dateUploaded,abstract,size",
                      fq="dateUploaded:[2017-06-01T00:00:00.000Z TO 2017-07-01T00:00:00.000Z]",
                      sort="dateUploaded+desc")
result <- query(mn, solrQuery=mySearchTerms, as="data.frame")
result[1,c("id", "title")]
id <- result[1,'id']

The metadata file that describes the located research can be downloaded and viewed in an XML viewer, text editor after being written to disk, or in R via the commands below:

library(XML)
metadata <- rawToChar(getObject(mn, id))
doc <- xmlRoot(xmlTreeParse(metadata, asText=TRUE, trim = TRUE, ignoreBlanks = TRUE))
tf <- tempfile()
saveXML(doc, tf)
file.show(tf)

This metadata file describes a data file (CSV) in this data collection (package) that can be obtained using the listed identifier, using the commands:

dataRaw <- getObject(mn, "urn:uuid:49d7a4bc-e4c9-4609-b9a7-9033faf575e0")
dataChar <- rawToChar(dataRaw)
theData <- textConnection(dataChar)
df <- read.csv(theData, stringsAsFactors=FALSE)
df[1,]

Uploading a CSV file to a DataONE Member Node requires user authentication. DataONE user authentication is described in the vignette dataone-federation.

Once the authentication steps have been followed, uploading is done with:

library(datapack)
library(uuid)
d1c <- D1Client("STAGING", "urn:node:mnStageUCSB2")
id <- paste("urn:uuid:", UUIDgenerate(), sep="")
testdf <- data.frame(x=1:10,y=11:20)
csvfile <- paste(tempfile(), ".csv", sep="")
write.csv(testdf, csvfile, row.names=FALSE)
# Build a DataObject containing the csv, and upload it to the Member Node
d1Object <- new("DataObject", id, format="text/csv", filename=csvfile)
uploadDataObject(d1c, d1Object, public=TRUE)

In addition, a collection of science metadata and data can be downloaded with one command, for example:

d1c <- D1Client("PROD", "urn:node:KNB")
pkg <- getDataPackage(d1c, id="urn:uuid:04cd34fd-25d4-447f-ab6e-73a572c5d383", quiet=FALSE)

See the R vignette dataone R Package for more information.

Acknowledgments

Work on this package was supported by:

  • NSF-ABI grant #1262458 to C. Gries, M. B. Jones, and S. Collins.
  • NSF-DATANET grants #0830944 and #1430508 to W. Michener, M. B. Jones, D. Vieglais, S. Allard and P. Cruse
  • NSF DIBBS grant #1443062 to T. Habermann and M. B. Jones
  • NSF-PLR grant #1546024 to M. B. Jones, S. Baker-Yeboah, J. Dozier, M. Schildhauer, and A. Budden

Additional support was provided for working group collaboration by the National Center for Ecological Analysis and Synthesis, a Center funded by the University of California, Santa Barbara, and the State of California.

Copy Link

Version

Install

install.packages('dataone')

Monthly Downloads

418

Version

2.2.1

License

Apache License 2.0

Issues

Pull Requests

Stars

Forks

Maintainer

Matthew B Jones

Last Published

December 6th, 2020

Functions in dataone (2.2.1)

AuthenticationManager-class

Manage DataONE authentication.
CNode

Create a CNode object.
initialize,D1Client-method

Initialize a D1Client object
D1Client-class

The D1Client class contains methods that perform high level DataONE tasks
AuthenticationManager

Create an AuthenticationManager object
CertificateManager

Create a CertificateManager object
AbstractTableDescriber-class

Base Class for Specific Metadata Parsers
D1Client

The DataONE client class used to download, update and search for data in the DataONE network.
CertificateManager-class

CertficateManager provides mechanisms to obtain, load, verify, and display X509 certificates.
CNode-class

Provides R API to DataONE Coordinating Node services.
initialize,D1Object-method

Initialize a D1Object
asDataFrame

return the D1Object data as a data.frame.
EMLParser

Construct an EML parser object.
EMLParser-class

Handler for Parsing Table Format Details from Metadata
addData,DataPackage,D1Object-method

Add a D1Object containing a data object to a DataPackage
canRead,D1Object-method

Test whether the provided subject can read an object.
D1Node-class

A base class for CNode and MNode.
initialize,D1Node-method

Initialize a D1Node
auth_put_post_delete

POST, PUT, or DELETE a resource with authenticated credentials.
MNode-class

Provides R API to DataONE Member Node services.
MNode

Create a MNode object representing a DataONE Member Node repository.
convert.csv

Convert a DataFrame to Standard CSV.
createD1Object

Create the Object in the DataONE System
data.tableAttributeStorageTypes

returns the attributes' data storage types
data.tableAttributeOrientation

The Attribute (Header) Orientation
data.tableAttributeTypes

returns the attributes' data types
D1Object

Create a D1Object instance.
archive

Archive an object on a Member Node or Coordinating Node, which hides it from casual searches.
data.formatFamily

Data Format
echoCredentials

Echo the credentials used to make the call.
data.tableFieldDelimiter

Field Delimiter
auth_delete

DELETE a resource with authenticated credentials.
encodeSolr

Encode the input for Solr Queries
getCertExpires

Show the date and time when an X.509 certificate expires.
getCertLocation

Get the file path on disk of the client certificate file.
getCertInfo

Get X.509 Certificate information
getChecksum

Get the checksum for the data object associated with the specified pid.
isCertExpired

Determine if an X.509 certificate has expired.
auth_post

POST a resource with authenticated credentials.
listFormats

List all object formats registered in DataONE.
query

Search DataONE for data and metadata objects
ping

Test if a node is online and accepting DataONE requests
auth_get

GET a resource with authenticated credentials if available.
createDataPackage

Create a DataPackage on a DataONE Member Node
D1Node

Create a D1Node object.
D1Object-class

D1Object (Defunct) is a representation of a DataObject.
auth_head

Send a http HEAD request for a resource with authenticated credentials if available.
createObject

Create an object on a Member Node.
showClientSubject

Get DataONE Identity as Stored in the CILogon Certificate.
d1_errors

This function parses a DataONE service response message for errors, and extracts and prints error information.
evaluateAuth

Evaluate DataONE authentication.
dataone-deprecated

Deprecated
data.tableAttributeNames

returns the attribute names
getAuthExpires

Get the expiration date of the current authentication method.
describeObject

Efficiently get systemmetadata for an object.
getAuthMethod

Get the current valid authentication mechanism.
dataone

Search, download and upload data to the DataONE network.
documented.d1Identifiers

Get DataONE identifiers
generateIdentifier

Get a unique identifier that is generated by the Member Node repository and guaranteed to be unique.
getDataObject

Download a file (and it's associated system metadata) from the DataONE Federation as a DataObject.
getDataPackage

Download data from the DataONE Federation as a DataPackage.
auth_put

PUT a resource with authenticated credentials.
updateObject

Update an object on a Member Node, by creating a new object that replaces an original.
d1IdentifierSearch

Query the DataONE Solr endpoint of the Coordinating Node.
data.tableMissingValueCodes

returns missing value codes
d1SolrQuery

A method to query the DataONE solr endpoint of the Coordinating Node.
documented.entityNames

Get the entity names associated with each table
data.tableQuoteCharacter

Quote Character
encodeUrlPath

Encode the Input for a URL Path Segment.
data.characterEncoding

CharacterEncoding
uploadDataPackage

Upload a DataPackage to a DataONE member node.
dataone-defunct

Defunct
data.tableSkipLinesHeader

Number of lines to skip before reading data
downloadCert

Open the CILogon Certificate download page in the default browser.
downloadObject

Download an object from the DataONE Federation to Disk.
isAuthExpired

Check if the currently valid authentication method has reached the expiration time.
getQueryEngineDescription

Query a node for the list of query engines available on the node
parseSolrResult

Parse Solr output into an R list
getPackage

Download a data package from a member node.
hasReservation

Checks to determine if the supplied subject is the owner of the reservation of id.
parseCapabilities

Construct a Node, using a passed in capabilities XML
getFormat

Get information for a single DataONE object format
setMNodeId

Set the member node identifier to be associated with the D1Client object.
setObsoletedBy

Set a pid as being obsoleted by another pid
getD1Object

Download a data object from the DataONE Federation.
encodeUrlQuery

Encode the Input for a URL Query Segment.
getFormatId,D1Object-method

Get the FormatId of the D1Object
listMemberNodes

List DataONE Member Nodes.
getMetadataMember

Get the DataObject containing package metadata
getObject

Get the bytes associated with an object on this Node.
getData,D1Object-method

Get the data content of a D1Object.
documented.sizes

Get the sizes of the described data tables.
getCert

Get the DataONE X.509 Certificate location.
getCapabilities

Get the node capabilities description, and store the information in the MNode.
getAuthSubject

Get the authentication subject.
listNodes

Get the list of nodes associated with a CN
getCN

Get the coordinating node associated with this D1Client object.
getIdentifier,D1Object-method

Get the Identifier of the D1Object
listObjects

Retrieve the list of objects that match the search parameters
reserveIdentifier

Reserve a identifier that is unique in the DataONE network.
resolve

Get a list of coordinating nodes holding a given pid.
getMN

Get a member node client based on its node identifier.
getErrorDescription

Extract an error message from an http response.
getEndpoint

Return the URL endpoint for the DataONE Coordinating Node.
getMNode

Get a reference to a node based on its identifier
getMNodeId

Get the member node identifier associated with this D1Client object..
getSystemMetadata

Get the metadata describing system properties associated with an object on this Node.
listQueryEngines

Query a node for the list of query engines available on the node
getTokenInfo

Get authentication token information
getToken

Get the value of the DataONE Authentication Token, if one exists.
isAuthValid

Verify authentication for a member node.
isAuthorized

Check if an action is authorized for the specified identifier
get_user_agent

User agent string
obscureAuth

Temporarily disable DataONE authentication.
showAuth

Display all authentication information
setPublicAccess,D1Object-method

Make the object publicly readable.
updateSystemMetadata

Update the system metadata associated with an object.
uploadDataObject

Upload a DataObject to a DataONE member node.
obscureCert

Obscure the CILogon Client Certificate
restoreAuth

Restore authentication (after being disabled with obscureAuth).
restoreCert

Restore the CILogon client certificate by renaming it to its original location