set_na: Replace specific values in vector with NA

Description

This function replaces specific values of a variable, data frame or list of variables with missings (NA).

Usage

set_na(x, value, drop.levels = TRUE, as.tag = FALSE)
set_na(x, drop.levels = TRUE, as.tag = FALSE) <- value

Arguments

Variable (vector), data frame or list of variables where new missing values should be defined. If x is a data.frame, each column is assumed to be a new variable, where missings should be defined.

value

Numeric vector with values that should be replaced with NA values, or a character vector if values of factors or character vectors should be replaced. For labelled vectors, may also be the name of a value label. In this case, the associated values for the value labels in each vector will be replaced with NA (see 'Examples').

drop.levels

Logical, if TRUE, factor levels of values that have been replaced with NA are dropped. See 'Examples'.

as.tag

Logical, if TRUE, values in x will be replaced by tagged_na, else by usual NA values. Use a named vector to assign the value label to the tagged NA value (see 'Examples').

Value

x, with all elements of value being replaced by NA.

Details

set_na converts all values defined in value with a related NA or tagged NA values (see tagged_na). Tagged NAs work exactly like regular R missing values except that they store one additional byte of information: a tag, which is usually a letter ("a" to "z") or character number ("0" to "9"). Furthermore, see 'Details' in get_na.

Examples

Run this code

# create random variable
dummy <- sample(1:8, 100, replace = TRUE)
# show value distribution
table(dummy)
# set value 1 and 8 as missings
dummy <- set_na(dummy, c(1, 8))
# show value distribution, including missings
table(dummy, useNA = "always")

# add named vector as further missing value
set_na(dummy, c("Refused" = 5), as.tag = TRUE)
# see different missing types
library(haven)
print_tagged_na(set_na(dummy, c("Refused" = 5), as.tag = TRUE))


# create sample data frame
dummy <- data.frame(var1 = sample(1:8, 100, replace = TRUE),
                    var2 = sample(1:10, 100, replace = TRUE),
                    var3 = sample(1:6, 100, replace = TRUE))
# set value 2 and 4 as missings
library(dplyr)
dummy %>% set_na(c(2, 4)) %>% head()
dummy %>% set_na(c(2, 4), as.tag = TRUE) %>% get_na()
dummy %>% set_na(c(2, 4), as.tag = TRUE) %>% get_values()

# create list of variables
data(efc)
dummy <- list(efc$c82cop1, efc$c83cop2, efc$c84cop3)
# check original distribution of categories
lapply(dummy, table, useNA = "always")
# set 3 to NA
lapply(set_na(dummy, 3), table, useNA = "always")

# drop unused factor levels when being set to NA
x <- factor(c("a", "b", "c"))
x
set_na(x, "b", as.tag = TRUE)
set_na(x, "b", drop.levels = FALSE, as.tag = TRUE)

# set_na() can also remove a missing by defining the value label
# of the value that should be replaced with NA. This is in particular
# helpful if a certain category should be set as NA, however, this category
# is assigned with different values accross variables
x1 <- sample(1:4, 20, replace = TRUE)
x2 <- sample(1:7, 20, replace = TRUE)
set_labels(x1) <- c("Refused" = 3, "No answer" = 4)
set_labels(x2) <- c("Refused" = 6, "No answer" = 7)

tmp <- data.frame(x1, x2)
get_labels(tmp)
get_labels(set_na(tmp, "No answer"))
get_labels(set_na(tmp, c("Refused", "No answer")))

# show values
tmp
set_na(tmp, c("Refused", "No answer"))

Run the code above in your browser using DataLab