Mark der Loo

Mark der Loo

16 packages on CRAN

dcmodify

cran
99.99th

Percentile

Data cleaning scripts typically contain a lot of 'if this change that' type of statements. Such statements are typically condensed expert knowledge. With this package, such 'data modifying rules' are taken out of the code and become in stead parameters to the work flow. This allows one to maintain, document, and reason about data modification rules as separate entities.

99.99th

Percentile

A collection of methods for automated data cleaning where all actions are logged.

deductive

cran
99.99th

Percentile

Attempt to repair inconsistencies and missing values in data records by using information from valid values and validation rules restricting the data.

99.99th

Percentile

Detect outliers in one-dimensional data.

gower

cran
99.99th

Percentile

Compute Gower's distance (or similarity) coefficient between records. Compute the top-n matches between records. Core algorithms are executed in parallel on systems supporting OpenMP.

hashr

cran
99.99th

Percentile

Apply the SuperFastHash algorithm to any R object. Hash whole R objects or, for vectors or lists, hash R objects to obtain a set of hash values that is stored in a structure equivalent to the input.

lintools

cran
99.99th

Percentile

Variable elimination (Gaussian elimination, Fourier-Motzkin elimination), Moore-Penrose pseudoinverse, reduction to reduced row echelon form, value substitution, projecting a vector on the convex polytope described by a system of (in)equations, simplify systems by removing spurious columns and rows and collapse implied equalities, test if a matrix is totally unimodular, compute variable ranges implied by linear (in)equalities.

rspa

cran
99.99th

Percentile

Minimally adjust the values of numerical records in a data.frame, such that each record satisfies a predefined set of equality and/or inequality constraints. The constraints can be defined using the 'validate' package. The core algorithms have recently been moved to the 'lintools' package, refer to 'lintools' for a more basic interface and access to a version of the algorithm that works with sparse matrices.

settings

cran
99.99th

Percentile

Provides option settings management that goes beyond R's default 'options' function. With this package, users can define their own option settings manager holding option names, default values and (if so desired) ranges or sets of allowed option values that will be automatically checked. Settings can then be retrieved, altered and reset to defaults with ease. For R programmers and package developers it offers cloning and merging functionality which allows for conveniently defining global and local options, possibly in a multilevel options hierarchy. See the package vignette for some examples concerning functions, S4 classes, and reference classes. There are convenience functions to reset par() and options() to their 'factory defaults'.

99.99th

Percentile

Easy to use interfaces to a number of imputation methods that fit in the not-a-pipe operator of the 'magrittr' package.

stringdist

cran
99.99th

Percentile

Implements an approximate string matching version of R's native 'match' function. Can calculate various string distances based on edits (Damerau-Levenshtein, Hamming, Levenshtein, optimal sting alignment), qgrams (q- gram, cosine, jaccard distance) or heuristic metrics (Jaro, Jaro-Winkler). An implementation of soundex is provided as well. Distances can be computed between character vectors while taking proper care of encoding or between integer vectors representing generic sequences. This package is built for speed and runs in parallel by using 'openMP'. An API for C or C++ is exposed as well.

validate

cran
99.99th

Percentile

Declare data validation rules and data quality indicators; confront data with them and analyze or visualize the results. The package supports rules that are per-field, in-record, cross-record or cross-dataset. Rules can be automatically analyzed for rule type and connectivity.

dendextend

cran
99.99th

Percentile

Offers a set of functions for extending 'dendrogram' objects in R, letting you visualize and compare trees of 'hierarchical clusterings'. You can (1) Adjust a tree's graphical parameters - the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different 'dendrograms' to one another.

editrules

cran
99.99th

Percentile

Facilitates reading and manipulating (multivariate) data restrictions (edit rules) on numerical and categorical data. Rules can be defined with common R syntax and parsed to an internal (matrix-like format). Rules can be manipulated with variable elimination and value substitution methods, allowing for feasibility checks and more. Data can be tested against the rules and erroneous fields can be found based on Fellegi and Holt's generalized principle. Rules dependencies can be visualized with using the 'igraph' package.

99.99th

Percentile

Errors in data can be located and removed using validation rules from package 'validate'.

99.99th

Percentile

Rule sets with validation rules may contain redundancies or contradictions. Functions for finding redundancies and problematic rules are provided, given a set a rules formulated with 'validate'.