dup: Analysis of the cardinality of a key/identifier in a table

Description

Creates multiple result tables. The term "n-plicate" is used to generalize the notion of duplicate: a n_plicate can be a duplicate, a triplicate, etc.

Usage

dup(tab, keyby, count_what = "rows", partition = NULL, view = TRUE)

Value

A set of dataframes in the global environment. * nup_r_tab: table of n-plicate counts * nup_xpl_r: table of n-plicate examples * nup_exZ_r: table of examples of (n-plicates with value 0) * nup_r_tab_part: table of n-plicate counts broken down by the modalities of the `partition` columns

Arguments

tab: Either an R dataframe or a reference to a remote table ("remote table")
keyby: (character vector) names of the column(s) considered as keys
count_what: (character vector) defines what to count by key (by *keyby*). 'rows' to count distinct rows, otherwise the name of the columns whose distinct values are to be counted
partition: (character vector) names of the columns by which to break down the analysis
view: automatic opening of generated tables

Examples

Run this code

# Check if "name" is a unique key of the starwars table (yes !)
dup(dplyr::starwars, keyby = "name", view = FALSE)

# Check if "key" is a unique key of the basic table (no !)
basic <- data.frame("key"   = c("a", "b", "c", "d", NA, "a", "e", "f"), 
                    "value" = c(112, 117, 317,  NA,  0,  17, 117, 112))
dup(basic, keyby = "key", view = FALSE)

Run the code above in your browser using DataLab