Edwin Jonge

Edwin Jonge

11 packages on CRAN

1 packages on GitHub

whisker

cran
96th

Percentile

logicless templating, reuse templates in many programming languages including R

ffbase

cran
76th

Percentile

Extends the out of memory vectors of 'ff' with statistical functions and other utilities to ease their usage.

docopt

cran

Define a command-line interface by just giving it a description in the specific format.

editrules

cran

Facilitates reading and manipulating (multivariate) data restrictions (edit rules) on numerical and categorical data. Rules can be defined with common R syntax and parsed to an internal (matrix-like format). Rules can be manipulated with variable elimination and value substitution methods, allowing for feasibility checks and more. Data can be tested against the rules and erroneous fields can be found based on Fellegi and Holt's generalized principle. Rules dependencies can be visualized with using the igraph package.

daff

cran

Diff, patch and merge for data frames. Document changes in data sets and use them to apply patches. Changes to data can be made visible by using render_diff. The V8 package is used to wrap the 'daff.js' JavaScript library which is included in the package.

chunked

cran

Text data can be processed chunkwise using 'dplyr' commands. These are recorded and executed per data chunk, so large files can be processed with limited memory using the 'LaF' package.

cbsodataR

cran

The data and meta data from Statistics Netherlands (www.cbs.nl) can be browsed and downloaded. The client uses the open data API of Statistics Netherlands.

Errors in data can be located and removed using validation rules from package 'validate'.

tabplotd3

cran

A tableplot is a visualisation of a (large) dataset with a dozen of variables, both numeric and categorical. This package contains an interactive version of tableplot working in your browser.

ffbase2

github

Dplyr functionality for out of memory data.frames of package 'ff' is provided

tabplot

cran

A tableplot is a visualisation of a (large) dataset with a dozen of variables, both numeric and categorical. Each column represents a variable and each row bin is an aggregate of a certain number of records. Numeric variables are visualized as bar charts, and categorical variables as stacked bar charts. Missing values are taken into account. Also supports large 'ffdf' datasets from the 'ff' package.

A collection of methods for automated data cleaning where all actions are logged.