Learn R Programming

ProduceR (version 1.0)

dup: Analysis of the cardinality of a key/identifier in a table

Description

Creates multiple result tables. The term "n-plicate" is used to generalize the notion of duplicate: a n_plicate can be a duplicate, a triplicate, etc.

Usage

dup(tab, keyby, count_what = "rows", partition = NULL, view = TRUE)

Value

A set of dataframes in the global environment. * nup_r_tab: table of n-plicate counts * nup_xpl_r: table of n-plicate examples * nup_exZ_r: table of examples of (n-plicates with value 0) * nup_r_tab_part: table of n-plicate counts broken down by the modalities of the `partition` columns

Arguments

tab

Either an R dataframe or a reference to a remote table ("remote table")

keyby

(character vector) names of the column(s) considered as keys

count_what

(character vector) defines what to count by key (by *keyby*). 'rows' to count distinct rows, otherwise the name of the columns whose distinct values are to be counted

partition

(character vector) names of the columns by which to break down the analysis

view

automatic opening of generated tables

Examples

Run this code
# Check if "name" is a unique key of the starwars table (yes !)
dup(dplyr::starwars, keyby = "name", view = FALSE)

# Check if "key" is a unique key of the basic table (no !)
basic <- data.frame("key"   = c("a", "b", "c", "d", NA, "a", "e", "f"), 
                    "value" = c(112, 117, 317,  NA,  0,  17, 117, 112))
dup(basic, keyby = "key", view = FALSE)

Run the code above in your browser using DataLab