qdapTools (version 1.3.7)

lookup: Hash Table/Dictionary Lookup lookup - data.table based hash table useful for large vector lookups.

Description

%l% - A binary operator version of lookup for when key.match is a data.frame or named list.

%l+% - A binary operator version of lookup for when key.match is a data.frame or named list and missing is assumed to be NULL.

%lc% - A binary operator version of lookup for when key.match is a data.frame or named list and all arguments are converted to character.

%lc+% - A binary operator version of lookup for when key.match is a data.frame or named list, missing is assumed to be NULL, and all arguments are converted to character.

Usage

lookup(terms, key.match, key.reassign = NULL, missing = NA)

# S3 method for list lookup(terms, key.match, key.reassign = NULL, missing = NA)

# S3 method for data.frame lookup(terms, key.match, key.reassign = NULL, missing = NA)

# S3 method for matrix lookup(terms, key.match, key.reassign = NULL, missing = NA)

# S3 method for numeric lookup(terms, key.match, key.reassign, missing = NA)

# S3 method for factor lookup(terms, key.match, key.reassign, missing = NA)

# S3 method for character lookup(terms, key.match, key.reassign, missing = NA)

terms %l% key.match

terms %l+% key.match

terms %lc% key.match

terms %lc+% key.match

Value

Outputs A new vector with reassigned values.

Arguments

terms

A vector of terms to undergo a lookup.

key.match

Takes one of the following: (1) a two column data.frame of a match key and reassignment column, (2) a named list of vectors (Note: if data.frame or named list supplied no key reassign needed) or (3) a single vector match key.

key.reassign

A single reassignment vector supplied if key.match is not a two column data.frame/named list.

missing

Value to assign to terms not matching the key.match. If set to NULL the original values in terms corresponding to the missing elements are retained.

See Also

setDT, hash

Examples

Run this code
## Supply a dataframe to key.match

lookup(1:5, data.frame(1:4, 11:14))

## Retain original values for missing 
lookup(1:5, data.frame(1:4, 11:14), missing=NULL) 

lookup(LETTERS[1:5], data.frame(LETTERS[1:5], 100:104))
lookup(LETTERS[1:5], factor(LETTERS[1:5]), 100:104)

## Supply a named list of vectors to key.match

codes <- list(
    A = c(1, 2, 4), 
    B = c(3, 5),
    C = 7,
    D = c(6, 8:10)
)

lookup(1:10, codes)

## Supply a single vector to key.match and key.reassign

lookup(mtcars$carb, sort(unique(mtcars$carb)),        
    c("one", "two", "three", "four", "six", "eight")) 
    
lookup(mtcars$carb, sort(unique(mtcars$carb)),        
    seq(10, 60, by=10))
  
## %l%, a binary operator version of lookup
1:5 %l% data.frame(1:4, 11:14)
1:10 %l% codes

1:12 %l% codes
1:12 %l+% codes
  
(key <- data.frame(a=1:3, b=factor(paste0("l", 1:3))))
1:3 %l% key

##Larger Examples
key <- data.frame(x=1:2, y=c("A", "B"))
big.vec <- sample(1:2, 3000000, TRUE)
out <- lookup(big.vec, key)
out[1:20]

## A big string to recode with variation
## means a bigger dictionary
recode_me <- sample(1:(length(LETTERS)*10), 10000000, TRUE)

## Time it
tic <- Sys.time()  

output <- recode_me %l% split(1:(length(LETTERS)*10), LETTERS)
difftime(Sys.time(), tic)

## view it
sample(output, 100)

Run the code above in your browser using DataLab