Learn R Programming

CSV on the Web R Package (csvwr)

Read and write csv tables annotated with metadata according to the "CSV on the Web" standard (CSVW).

The csvw model for tabular data describes how to annotate a group of csv tables to ensure they are interpreted correctly.

This package uses the csvw metadata schema to find tables, identify column names and cast values to the correct types.

The aim is to reduce the amount of manual work needed to parse and prepare data before it can be used in analysis.

Usage

Reading CSVW

You can use csvwr to read a csv table with json annotations into a data frame:

library(csvwr)

# Parse a csv table using json metadata :
csvw <- read_csvw("data.csv", "metadata.json")

# To extract the parsed table (with syntactic variable names and typed-columns):
csvw$tables[[1]]$dataframe

Alternatively, you can jump straight to the parsed table in one call:

read_csvw_dataframe("data.csv", "metadata.json")

Writing CSVW

You can also prepare annotations for a data frame:

# Given a data frame (saved as a csv)
d <- data.frame(x=c("a","b","c"), y=1:3)
write.csv(d, "table.csv", row.names=FALSE)

# Derive a schema
s <- derive_table_schema(d)

# Create metadata (as a list)
m <- create_metadata(tables=list(list(url="table.csv", tableSchema=s)))

# Serialise the metadata to JSON
j <- jsonlite::toJSON(m)

# Write the json to a file
cat(j, file="metadata.json")

For a complete introduction to the library please see the vignette("read-write-csvw").

Installation

You can install the latest release from CRAN:

install.packages("csvwr")

Or for the development version you can use devtools to install csvwr from GitHub:

install.packages("devtools")
devtools::install_github("Robsteranium/csvwr")

Contributing

Roadmap

Broadly speaking, the objectives are as follows:

  • parse csvw, creating dataframes with specified names and types (mostly implemented)
  • connecting associated csv tables and json files according to the conventions set out in the csvw standard (partly implemented)
  • support for validating a table according to a metadata document (a little implemented)
  • support for multiple tables (mostly implemented)
  • tools for writing csvw metadata, given an R data frame (partly implemented)
  • vignettes and documentation (mostly implemented)
  • scripts for running the most useful tools from the command line (not yet implemented)

It's not an urgent objective for the library to perform csv2rdf or csv2json translation although some support for csv2json is provided as this is used to test that the parsing is done correctly.

In terms of the csvw test cases provided by the standard, the following areas need to be addressed (in rough priority order):

  • datatypes (most of simple datatypes and some complex ones are supported, but there are more types and constraints too)
  • validations (there are a lot of these

Copy Link

Version

Install

install.packages('csvwr')

Monthly Downloads

382

Version

0.1.7

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Robin Gower

Last Published

November 21st, 2022

Functions in csvwr (0.1.7)

default_schema

Create a default table schema given a csv file and dialect
parse_columns

Parse columns schema
parse_metadata

Parse metadata
locate_table

Locate csv data table
location_configuration

Identify metadata location configurations for a tabular file
derive_table_schema

Derive csvw table schema from a data frame
read_csvw_dataframe

Read a data frame from the first table in a csvw
read_csvw

Read CSV on the Web
rlmap

Recursive lmap
rmap

Recursive map
list_of_lists_to_df

Parse list of lists specification into a data frame
read_metadata

Read and parse CSVW Metadata
transform_datetime_format

Transform date/time format string from Unicode TR35 to POSIX 1003.1
try_add_dataframe

Try to add a dataframe to the table
normalise_url

Normalise a URL
locate_metadata

Locate metadata for a table
render_cell

Serialise cell values for JSON representation
override_defaults

Override defaults
normalise_metadata

Normalise metadata
table_to_list

Convert a table to a list
render_uri_templates

Render URI templates
set_uri_base

Set the base of a URI template
normalise_property

Normalise an annotation property
type_to_datatype

Map R types to csvw datatype
unlist1

Unlist unless the list-elements are themselves lists
vec_depth

Calculate depth of vector safely
resolve_url

Resolve one URL against another
validate_referential_integrity

Validate the referential integrity of a csvw table group
validate_csvw

Validate CSVW specification
create_metadata

Create tabular metadata from a list of tables
add_dataframe

Add data frame to csvw table annotation
csvw_to_list

Convert a csvw metadata to a list (csv2json)
csvwr_example

Get path to csvwr example
datatype_to_type

Map csvw datatypes to R types
derive_metadata

Derive csvw metadata from a csv file
base_uri

Retrieve the base URI from configuration
coalesce_truth

Coalesce value to truthiness
csvwr

csvwr: Read and write CSV on the Web (CSVW)
compact_json_ld

Compact objects to values
base_url

Determine the base URL for CSVW metadata
extract_table

Extract a referenced table from CSVW metadata
json_ld_to_json

Convert json-ld annotation to json
is_non_core_annotation

Determine if an annotation is non-core
find_metadata

Find metadata for a tabular file
default_dialect

CSVW default dialect
is_absolute_url

Does the string provide an absolute URL
find_existing_file

Find the first existing file from a set of candidates