vlookup: Look up values in dictionary.

Description

vlookup/vlookup_df function is inspired by VLOOKUP spreadsheet function. It looks for a lookup_value in the lookup_column of the dict, and then returns values in the same rows from result_column. add_columns inspired by MATCH FILES (Add variables...) from SPSS Statistics. It works similar to SQL left join but number of cases in the left part always remain the same. If there are duplicated keys in the dict then error will be raised by default. .add_columns is the same function for default dataset.

Usage

vlookup(lookup_value, dict, result_column = 2, lookup_column = 1)
vlookup_df(lookup_value, dict, result_column = NULL, lookup_column = 1)
add_columns(data, dict, by = NULL, ignore_duplicates = FALSE)
.add_columns(dict, by = NULL, ignore_duplicates = FALSE)

Arguments

lookup_value

Vector of looked up values

dict

data.frame/matrix. Dictionary. Can be vector for vlookup/vlookup_df.

result_column

numeric or character. Resulting columns of dict. There are special values: 'row.names', 'rownames', 'names'. If result_column equals to one of these special values and dict is matrix/data.frame then row names of dict will be returned. If dict is vector then names of vector will be returned. For vlookup_df default result_column is NULL and result will be entire rows. For vlookup defaut result_column is 2 - for frequent case of dictionary with keys in the first column and results in the second column.

lookup_column

Column of dict in which lookup value will be searched. By default it is the first column of the dict. There are special values: 'row.names', 'rownames', 'names'. If lookup_column equals to one of these special values and dict is matrix/data.frame then values will be searched in the row names of dict. If dict is vector then values will be searched in names of the dict.

data

data.frame to be joined with dict.

character vector or NULL(default) or 1. Names of common variables in the data and dict by which we will attach dict to data. If it is NULL then common names will be used. If it is equals to 1 then we will use the first column from both dataframes. To add columns by different variables on data and dict use a named vector. For example, by = c("a" = "b") will match data.a to dict.b.

ignore_duplicates

logical Should we ignore duplicates in the by variables in the dict? If it is TRUE than first occurence of duplicated key will be used.

Value

vlookup always return vector, vlookup_df always returns data.frame. row.names in result of vlookup_df are not preserved.

Examples

Run this code

# NOT RUN {
# with data.frame
dict = data.frame(num=1:26, small=letters, cap=LETTERS, stringsAsFactors = FALSE)
rownames(dict) = paste0('rows', 1:26)
identical(vlookup_df(1:3, dict), dict[1:3,]) # should be TRUE
vlookup(c(45,1:3,58), dict, result_column='cap')
vlookup_df(c('z','d','f'), dict, lookup_column = 'small')
vlookup_df(c('rows7', 'rows2', 'rows5'), dict, lookup_column = 'row.names')

# with vector
dict=1:26
names(dict) = letters

vlookup(c(2,4,6), dict, result_column='row.names')

# The same results
vlookup(c(2,4,6), dict, result_column='rownames')
vlookup(c(2,4,6), dict, result_column='names')

# example for 'add_columns' from base 'merge'
authors = sheet(
    surname = c("Tukey", "Venables", "Tierney", "Ripley", "McNeil"),
    nationality = c("US", "Australia", "US", "UK", "Australia"),
    deceased = c("yes", rep("no", 4))
)

books = sheet(
    surname = c("Tukey", "Venables", "Tierney",
                "Ripley", "Ripley", "McNeil", "R Core"),
    title = c("Exploratory Data Analysis",
              "Modern Applied Statistics ...",
              "LISP-STAT",
              "Spatial Statistics", "Stochastic Simulation",
              "Interactive Data Analysis",
              "An Introduction to R")
)

add_columns(books, authors)

# Just for fun. Examples borrowed from Microsoft Excel.
# It is not the R way of doing things.

# Example 2

ex2 = utils::read.table(header = TRUE, text = "
    Item_ID Item Cost Markup 
    ST-340 Stroller 145.67  0.30  
    BI-567 Bib 3.56  0.40  
    DI-328 Diapers  21.45  0.35  
    WI-989 Wipes  5.12  0.40  
    AS-469 Aspirator 2.56  0.45 
", stringsAsFactors = FALSE)

# Calculates the retail price of diapers by adding the markup percentage to the cost. 
vlookup("DI-328", ex2, 3) * (1 + vlookup("DI-328", ex2, 4)) # 28.9575

# Calculates the sale price of wipes by subtracting a specified discount from
# the retail price.
(vlookup("WI-989", ex2, "Cost") * (1 + vlookup("WI-989", ex2, "Markup"))) * (1 - 0.2)  # 5.7344

A2 = ex2[1, "Item_ID"]
A3 = ex2[2, "Item_ID"]

# If the cost of an item is greater than or equal to $20.00, displays the string
# "Markup is nn%"; otherwise, displays the string "Cost is under $20.00".
ifelse(vlookup(A2, ex2, "Cost") >= 20, 
       paste0("Markup is " , 100 * vlookup(A2, ex2, "Markup"),"%"), 
       "Cost is under $20.00") # Markup is 30%


# If the cost of an item is greater than or equal to $20.00, displays the string
# Markup is nn%"; otherwise, displays the string "Cost is $n.nn".
ifelse(vlookup(A3, ex2, "Cost") >= 20, 
       paste0("Markup is: " , 100 * vlookup(A3, ex2, "Markup") , "%"), 
       paste0("Cost is $", vlookup(A3, ex2, "Cost"))) #Cost is $3.56


# Example 3

ex3 = utils::read.table(header = TRUE, text = "
    ID  Last_name  First_name  Title Birth_date  
    1 Davis Sara 'Sales Rep.'  12/8/1968 
    2 Fontana Olivier 'V.P. of Sales' 2/19/1952 
    3 Leal Karina 'Sales Rep.' 8/30/1963 
    4 Patten Michael 'Sales Rep.' 9/19/1958 
    5 Burke Brian 'Sales Mgr.' 3/4/1955 
    6 Sousa Luis 'Sales Rep.'  7/2/1963  
", stringsAsFactors = FALSE)

# If there is an employee with an ID of 5, displays the employee's last name;
# otherwise, displays the message "Employee not found".
if_na(vlookup(5, ex3, "Last_name"), "Employee not found") # Burke

# Many employees
if_na(vlookup(1:10, ex3, "Last_name"), "Employee not found") 

# For the employee with an ID of 4, concatenates the values of three cells into
# a complete sentence.
paste0(vlookup(4, ex3, "First_name"), " ",
       vlookup(4, ex3, "Last_name"), " is a ", 
       vlookup(4, ex3, "Title")) # Michael Patten is a Sales Rep.
# }

Run the code above in your browser using DataLab