anonymizer

anonymizer anonymizes data containing Personally Identifiable Information (PII) using a combination of salting and hashing. You can find quality examples of data anonymization in R here, here, and here.

Installation

You can install:

  • the latest released version from CRAN with

    install.packages("anonymizer")
  • the latest development version from github with

    if (packageVersion("devtools") < 1.6) {
      install.packages("devtools")
    }
    devtools::install_github("paulhendricks/anonymizer")

If you encounter a clear bug, please file a minimal reproducible example on github.

API

anonymzer employs four convenience functions: salt, unsalt, hash, and anonymize.

library(dplyr, warn.conflicts = FALSE)
library(anonymizer)
letters %>% head
#> [1] "a" "b" "c" "d" "e" "f"
letters %>% head %>% salt(.seed = 1)
#> [1] "gjoxfagjoxf" "gjoxfbgjoxf" "gjoxfcgjoxf" "gjoxfdgjoxf" "gjoxfegjoxf"
#> [6] "gjoxffgjoxf"
letters %>% head %>% salt(.seed = 1) %>% unsalt(.seed = 1)
#> [1] "a" "b" "c" "d" "e" "f"
letters %>% head %>% hash(.algo = "crc32")
#> [1] "c0749952" "597dc8e8" "2e7af87e" "b01e6ddd" "c7195d4b" "5e100cf1"
letters %>% head %>% salt(.seed = 1) %>% hash(.algo = "crc32")
#> [1] "b0891ad8" "361d6876" "fd41bbd3" "e0448b6b" "2b1858ce" "ad8c2a60"
letters %>% head %>% anonymize(.algo = "crc32", .seed = 1)
#> [1] "b0891ad8" "361d6876" "fd41bbd3" "e0448b6b" "2b1858ce" "ad8c2a60"

Generate data containing fake PII

library(generator)
n <- 6
set.seed(1)
ashley_madison <- 
  data.frame(name = r_full_names(n), 
             snn = r_national_identification_numbers(n), 
             dob = r_date_of_births(n), 
             email = r_email_addresses(n), 
             ip = r_ipv4_addresses(n), 
             phone = r_phone_numbers(n), 
             credit_card = r_credit_card_numbers(n), 
             lat = r_latitudes(n), 
             lon = r_longitudes(n), 
             stringsAsFactors = FALSE)
knitr::kable(ashley_madison, format = "markdown")
namesnndobemailipphonecredit_cardlatlon
Eldridge Pfannerstill442-34-53381991-11-13ntakqojv@lgbcyk.rkv45.84.71.22567949769584125-7204-9193-5140-2.70185758.634988
Augustine Homenick799-44-63961912-06-27iqg@mtcuh.viy191.116.55.10632758276942182-5994-2283-9486-70.4148630-65.827918
Jennie Runte941-11-54411983-09-15wjszy@sjhreocvt.gbp27.128.73.1774193517354370-4866-4735-7857-45.4091701-79.932229
Araceli Kunde290-44-26751947-07-28uljsnvhfr@qfdkumtn.jkd221.47.229.8632432462856682-5074-2898-9396-0.2673845103.514583
Josue Rau686-88-84461994-12-12c@lqxzkdpi.nfy157.136.114.18591697368734510-3757-4858-5236-22.883992572.886505
Elnora Zemlak212-40-70161974-11-01capvnl@nympzf.gsk143.20.199.8732958431967206-6205-2194-643278.2444466-120.590050

Detect data containing PII

library(detector)
ashley_madison %>% 
  detect %>% 
  knitr::kable(format = "markdown")
column_namehas_email_addresseshas_phone_numbershas_national_identification_numbers
nameFALSEFALSEFALSE
snnFALSEFALSETRUE
dobFALSEFALSEFALSE
emailTRUEFALSEFALSE
ipFALSEFALSEFALSE
phoneFALSETRUEFALSE
credit_cardFALSEFALSEFALSE
latFALSETRUEFALSE
lonFALSETRUEFALSE

Anonymize data containing PII

ashley_madison[] <- lapply(ashley_madison, anonymize, .algo = "crc32")
ashley_madison %>% 
  knitr::kable(format = "markdown")
namesnndobemailipphonecredit_cardlatlon
c83b4030393d73d7abf6427aa5deade4b6e2c6d3af086bcb7b5ba80064d9e7dc18006
98a6974d70ac65b06f83bc6a75947f05e0e7cef5c5620367cd11025fdf9526d5828b961
77dcbc4d391740d7b95109066cefaee2fbaaa8f19a66f57d299a42fe734886e39ea0e9a5
a48e2b0b6704117d65595953e1598468b7a422ba1f0a0373f420590f53155b4181018fc
4fecaeb29d6bf73260cdfc574b412ff9d1f2740cac553e93e3716031f3d9a005ef3bdb8d
abc3b85c908661898345b538f26e84b152596e0eb14fa5df9189fc4f85c69f65f0db3bb0

Copy Link

Version

Down Chevron

Install

install.packages('anonymizer')

Monthly Downloads

26

Version

0.2.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

September 8th, 2015

Functions in anonymizer (0.2.0)