CSV on the Web R Package (csvwr)

Read and write csv tables annotated with metadata according to the "CSV on the Web" standard (CSVW).

The csvw model for tabular data describes how to annotate a group of csv tables to ensure they are interpreted correctly.

This package uses the csvw metadata schema to find tables, identify column names and cast values to the correct types.

The aim is to reduce the amount of manual work needed to parse and prepare data before it can be used in analysis.

Usage

Reading CSVW

You can use csvwr to read a csv table with json annotations into a data frame:

library(csvwr)

# Parse a csv table using json metadata :
csvw <- read_csvw("data.csv", "metadata.json")

# To extract the parsed table (with syntactic variable names and typed-columns):
csvw$tables[[1]]$dataframe

Alternatively, you can jump straight to the parsed table in one call:

read_csvw_dataframe("data.csv", "metadata.json")

Writing CSVW

You can also prepare annotations for a data frame:

# Given a data frame (saved as a csv)
d <- data.frame(x=c("a","b","c"), y=1:3)
write.csv(d, "table.csv", row.names=FALSE)

# Derive a schema
s <- derive_table_schema(d)

# Create metadata (as a list)
m <- create_metadata(tables=list(list(url="table.csv", tableSchema=s)))

# Serialise the metadata to JSON
j <- jsonlite::toJSON(m)

# Write the json to a file
cat(j, file="metadata.json")

For a complete introduction to the library please see the vignette("read-write-csvw").

Installation

You can install the latest release from CRAN:

install.packages("csvwr")

Or for the development version you can use devtools to install csvwr from GitHub:

install.packages("devtools")
devtools::install_github("Robsteranium/csvwr")

Contributing

Roadmap

Broadly speaking, the objectives are as follows:

parse csvw, creating dataframes with specified names and types (mostly implemented)
connecting associated csv tables and json files according to the conventions set out in the csvw standard (partly implemented)
support for validating a table according to a metadata document (a little implemented)
support for multiple tables (mostly implemented)
tools for writing csvw metadata, given an R data frame (partly implemented)
vignettes and documentation (mostly implemented)
scripts for running the most useful tools from the command line (not yet implemented)

It's not an urgent objective for the library to perform csv2rdf or csv2json translation although some support for csv2json is provided as this is used to test that the parsing is done correctly.

In terms of the csvw test cases provided by the standard, the following areas need to be addressed (in rough priority order):

datatypes (most of simple datatypes and some complex ones are supported, but there are more types and constraints too)
validations (there are a lot of these

CSV on the Web R Package (csvwr)

Usage

Reading CSVW

Writing CSVW

Installation

Contributing

Roadmap

Copy Link

Version

Install

Monthly Downloads

Version

License

Issues

Pull Requests

Stars

Forks

Repository

Homepage

Maintainer

Last Published

Functions in csvwr (0.1.7)