
Last chance! 50% off unlimited learning
Sale ends in
vlookup
/vlookup_df
function is inspired by VLOOKUP spreadsheet
function. It looks for a lookup_value
in the lookup_column
of
the dict
, and then returns values in the same rows from
result_column
. add_columns
inspired by MATCH FILES (Add
variables...) from SPSS Statistics. It works similar to SQL left join but
number of cases in the left part always remain the same. If there are
duplicated keys in the dict
then error will be raised by default.
.add_columns
is the same function for default dataset.
vlookup(lookup_value, dict, result_column = 2, lookup_column = 1)vlookup_df(lookup_value, dict, result_column = NULL, lookup_column = 1)
add_columns(data, dict, by = NULL, ignore_duplicates = FALSE)
.add_columns(dict, by = NULL, ignore_duplicates = FALSE)
Vector of looked up values
data.frame/matrix. Dictionary. Can be vector for
vlookup
/vlookup_df
.
numeric or character. Resulting columns of dict
.
There are special values: 'row.names', 'rownames', 'names'. If
result_column
equals to one of these special values and dict
is matrix/data.frame then row names of dict
will be returned. If
dict
is vector then names of vector will be returned. For
vlookup_df
default result_column
is NULL and result will be
entire rows. For vlookup
defaut result_column
is 2 - for
frequent case of dictionary with keys in the first column and results in
the second column.
Column of dict
in which lookup value will be
searched. By default it is the first column of the dict
. There are
special values: 'row.names', 'rownames', 'names'. If lookup_column equals
to one of these special values and dict
is matrix/data.frame then
values will be searched in the row names of dict
. If dict
is
vector then values will be searched in names of the dict
.
data.frame to be joined with dict
.
character vector or NULL(default) or 1. Names of common variables
in the data
and dict
by which we will attach dict
to
data
. If it is NULL then common names will be used. If it is equals
to 1 then we will use the first column from both dataframes. To add columns
by different variables on data
and dict
use a named vector.
For example, by = c("a" = "b")
will match data.a to dict.b.
logical Should we ignore duplicates in the by
variables in the dict
? If it is TRUE than first occurence of duplicated
key will be used.
vlookup
always return vector, vlookup_df
always returns
data.frame. row.names
in result of vlookup_df
are not
preserved.
# NOT RUN {
# with data.frame
dict = data.frame(num=1:26, small=letters, cap=LETTERS, stringsAsFactors = FALSE)
rownames(dict) = paste0('rows', 1:26)
identical(vlookup_df(1:3, dict), dict[1:3,]) # should be TRUE
vlookup(c(45,1:3,58), dict, result_column='cap')
vlookup_df(c('z','d','f'), dict, lookup_column = 'small')
vlookup_df(c('rows7', 'rows2', 'rows5'), dict, lookup_column = 'row.names')
# with vector
dict=1:26
names(dict) = letters
vlookup(c(2,4,6), dict, result_column='row.names')
# The same results
vlookup(c(2,4,6), dict, result_column='rownames')
vlookup(c(2,4,6), dict, result_column='names')
# example for 'add_columns' from base 'merge'
authors = sheet(
surname = c("Tukey", "Venables", "Tierney", "Ripley", "McNeil"),
nationality = c("US", "Australia", "US", "UK", "Australia"),
deceased = c("yes", rep("no", 4))
)
books = sheet(
surname = c("Tukey", "Venables", "Tierney",
"Ripley", "Ripley", "McNeil", "R Core"),
title = c("Exploratory Data Analysis",
"Modern Applied Statistics ...",
"LISP-STAT",
"Spatial Statistics", "Stochastic Simulation",
"Interactive Data Analysis",
"An Introduction to R")
)
add_columns(books, authors)
# Just for fun. Examples borrowed from Microsoft Excel.
# It is not the R way of doing things.
# Example 2
ex2 = utils::read.table(header = TRUE, text = "
Item_ID Item Cost Markup
ST-340 Stroller 145.67 0.30
BI-567 Bib 3.56 0.40
DI-328 Diapers 21.45 0.35
WI-989 Wipes 5.12 0.40
AS-469 Aspirator 2.56 0.45
", stringsAsFactors = FALSE)
# Calculates the retail price of diapers by adding the markup percentage to the cost.
vlookup("DI-328", ex2, 3) * (1 + vlookup("DI-328", ex2, 4)) # 28.9575
# Calculates the sale price of wipes by subtracting a specified discount from
# the retail price.
(vlookup("WI-989", ex2, "Cost") * (1 + vlookup("WI-989", ex2, "Markup"))) * (1 - 0.2) # 5.7344
A2 = ex2[1, "Item_ID"]
A3 = ex2[2, "Item_ID"]
# If the cost of an item is greater than or equal to $20.00, displays the string
# "Markup is nn%"; otherwise, displays the string "Cost is under $20.00".
ifelse(vlookup(A2, ex2, "Cost") >= 20,
paste0("Markup is " , 100 * vlookup(A2, ex2, "Markup"),"%"),
"Cost is under $20.00") # Markup is 30%
# If the cost of an item is greater than or equal to $20.00, displays the string
# Markup is nn%"; otherwise, displays the string "Cost is $n.nn".
ifelse(vlookup(A3, ex2, "Cost") >= 20,
paste0("Markup is: " , 100 * vlookup(A3, ex2, "Markup") , "%"),
paste0("Cost is $", vlookup(A3, ex2, "Cost"))) #Cost is $3.56
# Example 3
ex3 = utils::read.table(header = TRUE, text = "
ID Last_name First_name Title Birth_date
1 Davis Sara 'Sales Rep.' 12/8/1968
2 Fontana Olivier 'V.P. of Sales' 2/19/1952
3 Leal Karina 'Sales Rep.' 8/30/1963
4 Patten Michael 'Sales Rep.' 9/19/1958
5 Burke Brian 'Sales Mgr.' 3/4/1955
6 Sousa Luis 'Sales Rep.' 7/2/1963
", stringsAsFactors = FALSE)
# If there is an employee with an ID of 5, displays the employee's last name;
# otherwise, displays the message "Employee not found".
if_na(vlookup(5, ex3, "Last_name"), "Employee not found") # Burke
# Many employees
if_na(vlookup(1:10, ex3, "Last_name"), "Employee not found")
# For the employee with an ID of 4, concatenates the values of three cells into
# a complete sentence.
paste0(vlookup(4, ex3, "First_name"), " ",
vlookup(4, ex3, "Last_name"), " is a ",
vlookup(4, ex3, "Title")) # Michael Patten is a Sales Rep.
# }
Run the code above in your browser using DataLab