dittodb (version 0.1.3)

redact_columns: Redact columns from a dataframe with the default redactors

Description

This function redacts the columns specified in columns in the data given in data using dittodb's standard redactors.

Usage

redact_columns(data, columns, ignore.case = TRUE, ...)

Arguments

data

a dataframe to redact

columns

character, the columns to redact

ignore.case

should case be ignored? (default: TRUE)

...

additional options to pass on to grep() when matching the column names

Value

data, with the columns specified in columns duly redacted

Details

The column names given in the columns argument are treated as regular expressions, however they always have ^ and $ added to the beginning and end of the strings. So if you would like to match any column that starts with the string sensitive (e.g. sensitive_name, sensitive_date) you could use "sensitive.* and this would catch all of those columns (though it would not catch a column called most_sensitive_name).

The standard redactors replace all values in the column with the following values based on the columns type:

  • integer -- 9L

  • numeric -- 9

  • character -- "[redacted]"

  • POSIXct (date times) -- as.POSIXct("1988-10-11T17:00:00", tz = tzone)

Examples

Run this code
# NOT RUN {
if (check_for_pkg("nycflights13", message)) {
  small_flights <- head(nycflights13::flights)

  # with no columns specified, redacting does nothing
  redact_columns(small_flights, columns = NULL)

  # integer
  redact_columns(small_flights, columns = c("arr_time"))

  # numeric
  redact_columns(small_flights, columns = c("arr_delay"))

  # characters
  redact_columns(small_flights, columns = c("origin", "dest"))

  # datetiems
  redact_columns(small_flights, columns = c("time_hour"))
}
# }

Run the code above in your browser using DataLab