set_na: Replace specific values in vector with NA

Description

This function replaces specific values of a variable, data frame or list of variables with missings (NA).

Usage

set_na(x, value, drop.levels = TRUE, as.tag = TRUE)
set_na(x) <- value

Arguments

Variable (vector), data.frame or list of variables where new missing values should be defined. If x is a data.frame, each column is assumed to be a new variable, where missings should be defined.

value

Numeric vector with values that should be replaced with a tagged_na. Thus, for each variable in x, value are replaced by tagged NA values.

drop.levels

Logical, if TRUE, factor levels of values that have been replaced with NA are dropped. See 'Examples'.

as.tag

Logical, if TRUE, values in x will be replaced by tagged_na, else by usual NA values.

Value

x, where each value of value is replaced by an a tagged NA.

Details

set_na converts all values defined in value with a related tagged NA (see tagged_na). Tagged NAs work exactly like regular R missing values except that they store one additional byte of information: a tag, which is usually a letter ("a" to "z") or character number ("0" to "9"). Furthermore, see 'Details' in get_na.

Examples

Run this code

# create random variable
dummy <- sample(1:8, 100, replace = TRUE)
# show value distribution
table(dummy)
# set value 1 and 8 as missings
dummy <- set_na(dummy, c(1, 8))
# show value distribution, including missings
table(dummy, useNA = "always")

# add named vector as further missing value
set_na(dummy, c("Refused" = 5))
# see different missing types
library(haven)
print_tagged_na(set_na(dummy, c("Refused" = 5)))


# create sample data frame
dummy <- data.frame(var1 = sample(1:8, 100, replace = TRUE),
                    var2 = sample(1:10, 100, replace = TRUE),
                    var3 = sample(1:6, 100, replace = TRUE))
# set value 2 and 4 as missings
library(dplyr)
dummy %>% set_na(c(2, 4)) %>% head()
dummy %>% set_na(c(2, 4)) %>% get_na()
dummy %>% set_na(c(2, 4)) %>% get_values()

# create list of variables
data(efc)
dummy <- list(efc$c82cop1, efc$c83cop2, efc$c84cop3)
# check original distribution of categories
lapply(dummy, table, useNA = "always")
# set 3 to NA
lapply(set_na(dummy, 3), table, useNA = "always")

# drop unused factor levels when being set to NA
x <- factor(c("a", "b", "c"))
x
set_na(x, "b")
set_na(x, "b", drop.levels = FALSE)

Run the code above in your browser using DataLab