OAIHarvester (version 0.3-1)

verb: OAI-PMH Verb Functions

Description

Perform Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) requests for harvesting repositories.

Usage

oaih_get_record(baseurl, identifier, prefix = "oai_dc",
                transform = TRUE)
oaih_identify(baseurl, transform = TRUE)
oaih_list_identifiers(baseurl, prefix = "oai_dc", from = NULL,
                      until = NULL, set = NULL, transform = TRUE)
oaih_list_metadata_formats(baseurl, identifier = NULL,
                           transform = TRUE)
oaih_list_records(baseurl, prefix = "oai_dc", from = NULL,
                  until = NULL, set = NULL, transform = TRUE)
oaih_list_sets(baseurl, transform = TRUE)

Arguments

baseurl

a character string giving the base URL of the repository.

identifier

a character string giving the unique identifier for an item in a repository.

prefix

a character string to specify the metadata format in OAI-PMH requests issued to the repository. The default ("oai_dc") corresponds to the mandatory OAI unqualified Dublin Core metadata schema.

from, until

character strings giving datestamps to be used as lower or upper bounds, respectively, for datestamp-based selective harvesting (i.e., only harvest records with datestamps in the given range). Dates and times must be encoded using ISO 8601 in either %F or %FT%TZ format (see strptime). The trailing Z must be used when including time. OAI-PMH implies UTC for data/time specifications.

set

a character string giving a set to be used for selective harvesting (i.e., only harvest records in the given set).

transform

a logical indicating whether the OAI-PMH XML results to “useful” R data structures via oaih_transform. Default: true.

Value

If the OAI-PMH request was successful, the result of the request as XML or (default) transformed to “useful” R data structures.

Examples

Run this code
# NOT RUN {
## Harvest ePubWU metadata.
baseurl <- "http://epub.wu.ac.at/cgi/oai2"
## Identify.
oaih_identify(baseurl)
## List metadata formats.
oaih_list_metadata_formats(baseurl)
## List sets.
sets <- oaih_list_sets(baseurl)
sets
## List records in the 'theses' set.
spec <- unlist(sets[sets[, "setName"] == "Type = Thesis", "setSpec"])
x <- oaih_list_records(baseurl, set = spec)
## Drop deleted records and extract the metadata.
m <- x[, "metadata"]
m <- oaih_transform(m[lengths(m) > 0L])
## Find the most frequent keywords.
sep <- "[[:space:]]*/[[:space:]]*"
keywords <- unlist(strsplit(unlist(m[, "subject"]), sep))
head(sort(table(keywords), decreasing = TRUE))
# }

Run the code above in your browser using DataLab