Learn R Programming

googlePublicData (version 0.15.7.28)

dspl: Builds DSPL metadata file

Description

Parsing csv, tab or xls(x) files at a specific directory path, dspl generates a complete DSPL file. If an output string is specified, the function generates the complete ZIP (DSPL file plus csv files) ready to be uploaded to Google Public Data Explorer.

Usage

dspl(path, output = NA, replace = F, targetNamespace = "", timeFormat = "yyyy", lang = c("es", "en"), name = NA, description = NA, url = NA, providerName = NA, providerURL = NA, sep = ";", dec=".", encoding = getOption("encoding"), moreinfo = NA)

Arguments

path
String. Path to the folder where the tables (csv|tab|xls) are at.
output
String, optional. Path to the output ZIP file.
replace
Logical. If output ZIP file is defined exists, dspl replaces it.
targetNamespace
String. As DSPL documentation states ``Provides a URI that identifies your dataset. This URI is not required to point to an actual resource, but it's a good idea to have the URI resolve to a document describing your content or dataset''.
timeFormat
String. The corresponding time format of the collection. Should be specified accordingly to joda-time format. See the Details section for more information.
lang
A list of strings of the languages supported by the dataset. Could be only one.
name
List of strings. The name of the dataset as defined accordingly to the lang list.
description
List of strings. Description of the dataset. It also supports multiple description as the name
url
The corresponding URL for the dataset.
providerName
List of strings. The data provider name.
providerURL
List of strings. The data provider website url.
sep
The separation character of the tables in the 'path' folder. Currently supports introducing the following arguments: ``,'' or ``;'' (for .csv files), ``\t'' (for .tab files) and ``xls'' or ``xlsx'' (for microsofts excel files); the last one using XLConnect library.
dec
String. Decimal point.
encoding
The char encoding of the input tables. Currently ignored for microsoft excel files.
moreinfo
A special tab file generated by the function genMoreInfo that contains a dataframe of the dataset concepts with more specifications such as description, topic, url, etc.

Value

If there isn't any output defined, dspl returns list of class "dspl".An object of class "dspl" is a list containing:
dspl
A character string containing the DSPL XML document as defined by the saveXML function.
concepts.by.table
A data frame object of concepts stored by table.
dimtabs
A data frame containing dimentional tables.
slices
A data frame of slices.
concepts
A data frame of concepts (all of them).
dimentions
A data frame of dimentional concepts.
statistics
A matrix of statistics.
otherwise the function will build a ZIP file as specified in the output containing the CSV and DSPL (XML) files.

Details

If there isn't any output defined the function returns a list of class dspl that among its contents has a xml object (DSPL file); otherwise, if an output is defined, the results consists on two things, an already ZIP file containing a all the necesary to be uploaded at publicdata.google.com (a collection of csv files and the XML DSPL written file) and a message (character object). Internally, the parsing process consists on the following steps: (1) Loading the data, (2) generating each column corresponding id, (3) Identifying the data types, (4) building concepts, (5) Identifying dimentional concepts and distinguishing between categorical, geographical and time dimentions, and finally (6) executing internal checks. In order to properly load the zip file (DSPL file plus CSV data files), the function executes a series of internal checks upon the data structure. The detailed list:
  • Slices with the same dimentions: DSPL requires that each slice represents one dimentional cut, this is, there shouldn't be more than one data table with the same dimensions.
  • Duplicated concetps: As a result of multiple data types, e.g a single concept (statistic) as integer in one table and float in other, dspl may get confused, so during the parsing process, if there is a chance, it collapses duplicated concepts into only one concept and assigns it the common data type (float).
  • Correct time format definition: Using checkTimeFormat ensures that the time format specified is compatible with DSPL.

References

Examples

Run this code
  ## Not run: 
#   demo(dspl)
#   ## End(Not run)

Run the code above in your browser using DataLab