Last chance! 50% off unlimited learning
Sale ends in
joinbyv(master, join, by, col.subset, row.subset, nomatch, allow.cartesian)
lapply(join, key)
. Required when not all of join data.tables has key.lapply(join, names)
.i
argument. Default: as.list(rep(TRUE,length(join)))
. To subset result master data.table, use row.subset together with corresponding nomatch argument equal to 0
(inner join). By default when providing row.subset list element the corresponding nomatch argument will be changed to 0
to perform inner join, otherwise it will be getOption("datatable.nomatch")
. If you really want to do outer join to already filtered join data.table you need to override corresponding nomatch argument for NA
. Cross table expressions and not supported inside joinbyv.NA
or 0
elements corresponding join
data.tables. Default: lapply(row.subset, function(x) if(is.language(x)) 0 else getOption("datatable.nomatch"))
. Indicates outer join for NA
and inner join for 0
. When data.table extends allowed argument in the getOption("datatable.nomatch")
then it should accept not only integer scalar but any value supported by data.table as nomatch argument.join
data.tables to define which of the joins are allowed to do cartesian product. Default: as.list(rep(getOption("datatable.allow.cartesian"),length(join)))
.col.subset
, no key.
NULL
inside the lists provided to by, col.subset, row.subset, nomatch, allow.cartesian will be replaced by the default value for particular NULL
element. Therefore it is possible to pass partially filled lists, length of each must match to length of join, example col.subset=list(NULL,c("join2_col1","join2_col2"),NULL,c("join4_col1"))
.
Function do not allow cross tables row.subset filtering expressions. User should apply such filter after joinbyv
, example: joinbyv(master, join)[join1_colA > join2_colA * 2]
.
Arguments nomatch, allow.cartesian by default will be setup according to data.table options. An exception case described in row.subset argument will by default override the nomatch argument to value 0
to perform inner join.
Possibly the performance might be improved after implementing data.table FR #691 and #692.
suppressPackageStartupMessages(library(dwtools))
X <- dw.populate() # scenario="star"
lapply(X, head)
DT <- joinbyv(master = X$SALES,
join = list(time = X$TIME, product = X$PRODUCT, currency = X$CURRENCY),
by = list("time_code","prod_code","curr_code"))
str(DT)
# joinbyv including timing
DT <- timing(
joinbyv(master = X$SALES,
join = list(time = X$TIME, product = X$PRODUCT, currency = X$CURRENCY),
by = list("time_code","prod_code","curr_code"))
)
get.timing()
Run the code above in your browser using DataLab