Learn R Programming

kutils (version 0.93)

assignMissing: Scrub a variable's missings away

Description

The missings values have to be carefully written, depending on the type of variable that is being processed.

Usage

assignMissing(x, missings = NULL)

Arguments

x
A variable
missings
A string with a vector of values or R expressions. These are done differently for integer, numeric, factor, and character variables.
  1. For integer variables, use a character string representing part of an R expression such "> 8", ">= 8", "< 7", or "<= 7", or a character string enclosing a range, a two valued vector, as in "c(8,9)". Any strings that do not begin with ">", "<", or "c" will be ignored. To reset particular values as missing one-by-one, use the variable key.

  2. For numerics, use an inequality such as "> 99". The only other alternative we have allowed is a character string that represents a range such as "c(99, 101)", to mean that values greater than OR equal to 99 and less than OR equal to 101 will be set as missing.

  3. For factors, include a vector of levels to be marked as missing and removed from the list of levels.

  4. For character variables, a character vector of values to be marked as missing.

One of the concerns is that comparison of real-valued numerics is not dependable. Exact comparisons with == are unreliable, so don't ask for them.

Value

A cleaned column in which R's NA symbol replaces values that should be missing

Examples

Run this code
## 1.  Integers.
## must be very sure these are truly integers, or else fails
x <- seq.int(2L, 22L, by = 2L)
## Specify range, 4 to 12 inclusive
missings <- "c(4, 12)"
assignMissing(x, missings)

missings <- " < 7"
assignMissing(x, missings)

missings <- " > 11"
assignMissing(x, missings)

## 2. strings
x <- c("low", "low", "med", "high")
missings <- "c(\"low\", \"high\")"
assignMissing(x, missings)
missings <- c("med", "doesnot exist")
assignMissing(x, missings)

## 3. factors (same as strings inside assignMissing)
x <- factor(c("low", "low", "med", "high"), levels = c("low", "med", "high"))
missings <- c("low", "high")
assignMissing(x, missings)
missings <- c("med", "doesnot exist")
assignMissing(x, missings)
## ordered factor:
x <- ordered(c("low", "low", "med", "high"), levels = c("low", "med", "high"))
missings <- c("low", "high")
assignMissing(x, missings)

## 4. Real-valued variable
set.seed(234234)
x <- rnorm(10)
missings <- "< 0"
assignMissing(x, missings)
missings <- "> -0.2"
assignMissing(x, missings)
missings <- "c(0.1, 0.7)"
assignMissing(x, missings)

Run the code above in your browser using DataLab