Learn R Programming

The dataset R Package

Overview

The dataset package helps you create semantically rich, machine-readable, and interoperable datasets in R. It introduces S3 classes that extend data frames, vectors, and bibliographic entries with formal metadata structures inspired by:

  • SDMX (Statistical Data and Metadata eXchange), widely used in official statistics
  • Dublin Core and DataCite, for FAIR-compliant depositing and reuse in scientific and open data repositories
  • Open Science publishing practices, to support transparent and reproducible research

The goal is to preserve metadata when reusing statistical and repository datasets, improve interoperability, and make it easy to turn tidy data frames into web-ready, publishable datasets that comply with ISO and W3C standards.

Installation

You can install the latest released version of dataset from CRAN with:

install.packages("dataset")

To install the development version from GitHub with pak or remotes:

# install.packages("pak")
pak::pak("dataobservatory-eu/dataset")

# install.packages("remotes")
remotes::install_github("dataobservatory-eu/dataset")

Minimal Example

library(dataset)
df <- dataset_df(
  country = defined(
    c("AD", "LI"),
    label = "Country",
    namespace = "https://www.geonames.org/countries/$1/"
  ),
  gdp = defined(c(3897, 7365),
    label = "GDP",
    unit = "million euros"
  ),
  dataset_bibentry = dublincore(
    title = "GDP Dataset",
    creator = person("Jane", "Doe", role = "aut"),
    publisher = "Small Repository"
  )
)
print(df)
#> Doe (2025): GDP Dataset [dataset]
#>   rowid country   gdp 
#>   <chr> <chr>   <dbl>
#> 1 obs1  AD       3897
#> 2 obs2  LI       7365

Export as RDF triples:

dataset_to_triples(df, format = "nt")
#> [1] "<http://example.com/dataset#obsobs1> <http://example.com/prop/country> <https://www.geonames.org/countries/AD/> ."
#> [2] "<http://example.com/dataset#obsobs2> <http://example.com/prop/country> <https://www.geonames.org/countries/LI/> ."
#> [3] "<http://example.com/dataset#obsobs1> <http://example.com/prop/gdp> \"3897\"^^<xsd:decimal> ."                     
#> [4] "<http://example.com/dataset#obsobs2> <http://example.com/prop/gdp> \"7365\"^^<xsd:decimal> ."

Retain automatically recorded provenance:

provenance(df)
#> [1] "<http://example.com/dataset_prov.nt> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Bundle> ."                  
#> [2] "<http://example.com/dataset#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Entity> ."                         
#> [3] "<http://example.com/dataset#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/linked-data/cube#DataSet> ."                 
#> [4] "_:doejane <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Agent> ."                                              
#> [5] "<https://doi.org/10.32614/CRAN.package.dataset> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#SoftwareAgent> ."
#> [6] "<http://example.com/creation> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Activity> ."                       
#> [7] "<http://example.com/creation> <http://www.w3.org/ns/prov#generatedAtTime> \"2025-11-16T08:47:24Z\"^^<xsd:dateTime> ."

Contributing

We welcome contributions and discussion!

Code of Conduct

This project follows the rOpenSci Code of Conduct. By participating, you are expected to uphold these guidelines.

Copy Link

Version

Install

install.packages('dataset')

Monthly Downloads

1,586

Version

0.4.1

License

GPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

Daniel Antal

Last Published

November 16th, 2025

Functions in dataset (0.4.1)

as.POSIXct.haven_labelled_defined

Coerce a defined POSIXct vector to a base R POSIXct
as_character

Coerce a defined vector to character
as_tibble.dataset_df

Coerce a dataset_df to a tibble
contributor

Get or set contributors
dataset_to_triples

Dataset to triples (three columns or N-Triples)
dataset_df

Create a new dataset_df object
creator

Get/set the Creator of the object.
default_provenance

Build default provenance bundle
as_datacite

Create a Bibentry Object with DataCite Metadata Fields
dataset_format

Get or set the technical format of a dataset
dataset_title

Get or Set the Title of a Dataset
gdp

A Small GDP Dataset
geolocation

Get or Set the Geolocation of a Dataset Object
get_variable_concepts

Get concepts for all variables in a dataset_df
defined

Create a semantically enriched vector with variable-level metadata
describe

Describe a dataset in N-Triples format
clean_person_name

Remove role suffixes from formatted person names
expand_triples

Internal: Expand multi-valued DC fields to RDF triples
c.haven_labelled_defined

Combine defined vectors with metadata checks
fix_contributor

Format contributors into a citation string
get_bibentry

Get or set the bibentry
haven_labelled_defined

Semantic labelled vector class
description

Get or set the dataset Description
print.haven_labelled_defined

Print a defined (haven_labelled_defined) vector
provenance

Get or update provenance information
as_dublincore

Add or Retrieve Dublin Core Metadata
id_to_column

Add Identifier to First Column of a Dataset
language

Set the Primary Language of a Dataset
identifier

Get or Set the Identifier of a Dataset or Metadata Record
n_triples

Create N-Triples
orange_df

Growth of Orange Trees
publisher

Get or Set the Publisher of a Dataset Object
publication_year

Get or Set the Publication Year of a Dataset Object
map_role_to_schema

Map R person roles to schema.org-style roles
n_triple

Create an N-Triple
var_labels

Get or set all variable labels on a dataset
var_unit

Get or Set a Unit of Measure
triples_to_ntriples

Internal: Convert triple data.frame to N-Triples format
vec_cast_named

From haven
triples_column_generate

Internal: Generate RDF triples for a single column
var_namespace

Get or Set the Namespace of a Variable
relation

Add or retrieve related items (DataCite/Dublin Core)
rights

Get or Set the Rights of a Dataset Object
strip_defined

Strip the class from a defined vector
subject

Create, add, or retrieve a subject
var_concept

Get / set a concept definition for a vector or a dataset
var_label

Get or Set a Variable Label
xsd_convert

Convert to XML Schema Definition (XSD) Types
as_logical

Coerce a defined vector to logical
as_numeric

Coerce a defined vector to numeric
as.Date.haven_labelled_defined

Coerce a defined Date vector to a base R Date
bibrecord

Create a Modern Metadata Object Compatible with bibentry
as_factor

Coerce a defined vector to a factor
bind_defined_rows

Bind strictly defined rows
as.data.frame.dataset_df

Convert a dataset_df to a base data.frame