John Mount

John Mount

10 packages on CRAN

cdata

cran
99.99th

Percentile

Supplies higher-order fluid data transform operators that include pivot and anti-pivot as special cases. The methodology is describe in 'Zumel', 2018, "Fluid data reshaping with 'cdata'", <http://winvector.github.io/FluidData/FluidDataReshapingWithCdata.html> , doi:10.5281/zenodo.1173299 . Works on in-memory data or on remote data using 'rquery' and the 'DBI' database interface.

99.99th

Percentile

Dynamic Programming implemented in 'Rcpp'. Includes example partition and out of sample fitting applications. Also supplies additional custom coders for the 'vtreat' package.

replyr

cran
99.99th

Percentile

Patches to use 'dplyr' on remote data sources ('SQL' databases, 'Spark' 2.0.0 and above) in a reliable "generic" fashion (generic meaning user code works similarly on all such sources, without needing per-source adaption). Due to the fluctuating nature of 'dplyr'/'dbplyr'/'rlang' 'APIs' this package is going into maintenance mode. Most of the 'replyr' functions are now done better by one of the non-monolithic replacement packages: 'wrapr', 'seplyr', 'rquery', or 'cdata'.

99.99th

Percentile

Implements the 'rquery' piped Codd-style query algebra using 'data.table'. This allows for a high-speed in memory implementation of Codd-style data manipulation tools.

rquery

cran
99.99th

Percentile

A piped query generator based on Edgar F. Codd's relational algebra, and on production experience using 'SQL' and 'dplyr' at big data scale. The design represents an attempt to make 'SQL' more teachable by denoting composition by a sequential pipeline notation instead of nested queries or functions. The implementation delivers reliable high performance data processing on large data systems such as 'Spark', databases, and 'data.table'. Package features include: data processing trees or pipelines as observable objects (able to report both columns produced and columns used), optimized 'SQL' generation as an explicit user visible table modeling step, plus explicit query reasoning and checking.

seplyr

cran
99.99th

Percentile

The 'seplyr' (standard evaluation plying) package supplies improved standard evaluation adapter methods for important common 'dplyr' data manipulation tasks. In addition the 'seplyr' package supplies several new "key operations bound together" methods. These include 'group_summarize()' (which combines grouping, arranging and calculation in an atomic unit), 'add_group_summaries()' (which joins grouped summaries into a 'data.frame' in a well documented manner), 'add_group_indices()' (which adds per-group identifiers to a 'data.frame' without depending on row-order), 'partition_mutate_qt()' (which optimizes mutate sequences), and 'if_else_device()' (which simulates per-row if-else blocks in expression sequences).

sigr

cran
99.99th

Percentile

Succinctly and correctly format statistical summaries of various models and tests (F-test, Chi-Sq-test, Fisher-test, T-test, and rank-significance). This package also includes empirical tests, such as Monte Carlo and bootstrap distribution estimates.

vtreat

cran
99.99th

Percentile

A 'data.frame' processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. 'vtreat' prepares variables so that data has fewer exceptional cases, making it easier to safely use models in production. Common problems 'vtreat' defends against: 'Inf', 'NA', too many categorical levels, rare categorical levels, and new categorical levels (levels seen during application, but not during training). Reference: "'vtreat': a data.frame Processor for Predictive Modeling", Zumel, Mount, 2016, <DOI:10.5281/zenodo.1173313>.

wrapr

cran
99.99th

Percentile

Tools for writing and debugging R code. Provides: '%.>%' dot-pipe (an 'S3' configurable pipe), unpack/to (R style multiple assignment/return), 'build_frame()'/'draw_frame()' ('data.frame' example tools), 'qc()' (quoting concatenate), ':=' (named map builder), 'let()' (converts non-standard evaluation interfaces to parametric standard evaluation interfaces, inspired by 'gtools::strmacro()' and 'base::bquote()'), and more.

WVPlots

cran
99.99th

Percentile

Select data analysis plots, under a standardized calling interface implemented on top of 'ggplot2' and 'plotly'. Plots of interest include: 'ROC', gain curve, scatter plot with marginal distributions, conditioned scatter plot with marginal densities, box and stem with matching theoretical distribution, and density with matching theoretical distribution.