Learn R Programming

pickmax (version 0.1.0)

Split and Coalesce Duplicated Records

Description

Deduplicates datasets by retaining the most complete and informative records. Identifies duplicated entries based on a specified key column, calculates completeness scores for each row, and compares values within groups. When differences between duplicates exceed a user-defined threshold, records are split into unique IDs; otherwise, they are coalesced into a single, most complete entry. Returns a list containing the original duplicates, the split entries, and the final coalesced dataset. Useful for cleaning survey or administrative data where duplicated IDs may reflect minor data entry inconsistencies.

Copy Link

Version

Install

install.packages('pickmax')

Monthly Downloads

135

Version

0.1.0

License

GPL-3

Maintainer

Sbonelo Chamane

Last Published

July 15th, 2025

Functions in pickmax (0.1.0)

pickmax

Split and Coalesce Duplicated Records