Learn R Programming

ffbase (version 0.6-2)

merge.ffdf: Merge two ffdf by common columns, or do other versions of database join operations.

Description

Merge two ffdf by common columns, or do other versions of database join operations. This method is similar as merge in the base package but only allows inner and left outer joins. Mark that joining is done based on ffmatch or ffdfmatch, meaning that only the first element in y will be added to x and ffdfmatch works on link[base]{paste}-ing together a key. So this might not be suited if your key contains columns of vmode double.

Usage

## S3 method for class 'ffdf':
merge(x, y,
    by = intersect(names(x), names(y)), by.x = by,
    by.y = by, all = FALSE, all.x = all, all.y = all,
    sort = FALSE, suffixes = c(".x", ".y"),
    incomparables = NULL, trace = FALSE, ...)

Arguments

x
an ffdf
y
an ffdf
by
specifications of the common columns. Columns can be specified by name, number or by a logical vector.
by.x
specifications of the common columns of the x ffdf, overruling the by parameter
by.y
specifications of the common columns of the y ffdf, overruling the by parameter
all
see merge in R base
all.x
if TRUE, then extra rows will be added to the output, one for each row in x that has no matching row in y. These rows will have NAs in those columns that are usually filled with values from y. The default is FALSE, so that only rows with data
all.y
similar as all.x
sort
logical, currently not used yet, defaults to FALSE.
suffixes
character(2) specifying the suffixes to be used for making non-by names() unique.
incomparables
values which cannot be matched. See match. Currently not used.
trace
logical indicating to show on which chunk the function is computing
...
other options passed on to ffdfindexget

Value

  • an ffdf

Details

If a left outer join is performed and no matching record in x is found in y, columns with vmodes 'boolean', 'quad', 'nibble', 'ubyte', 'ushort' are coerced respectively to vmode 'logical', 'byte', 'byte', 'short', 'integer' to allow NA values.

See Also

merge

Examples

Run this code
authors <- data.frame(
    surname = c("Tukey", "Venables", "Tierney", "Ripley", "McNeil"),
    nationality = c("US", "Australia", "US", "UK", "Australia"),
    deceased = c("yes", rep("no", 4)))
books <- data.frame(
    name = c("Tukey", "Venables", "Tierney",
             "Ripley", "Ripley", "McNeil", "R Core"),
    title = c("Exploratory Data Analysis",
              "Modern Applied Statistics ...",
              "LISP-STAT",
              "Spatial Statistics", "Stochastic Simulation",
              "Interactive Data Analysis",
              "An Introduction to R"),
    other.author = c(NA, "Ripley", NA, NA, NA, NA,
                     "Venables & Smith"))
books <- lapply(1:2000, FUN=function(x, books){
	books$price <- rnorm(nrow(books))
	books
}, books=books)
books <- do.call(rbind, books)
authors <- as.ffdf(authors)                
books <- as.ffdf(books)

dim(books)
dim(authors)
## Inner join
oldffbatchbytes <- getOption("ffbatchbytes")
options(ffbatchbytes = 100)
m1 <- merge(books, authors, by.x = "name", by.y = "surname", all.x=FALSE, all.y=FALSE, trace = TRUE)
dim(m1)
unique(paste(m1$name[], m1$nationality[]))
unique(paste(m1$name[], m1$deceased[]))
m2 <- merge(books[,], authors[,], by.x = "name", by.y = "surname", all.x=FALSE, all.y=FALSE, sort = FALSE)
dim(m2)
unique(paste(m2$name[], m2$nationality[]))
unique(paste(m2$name[], m2$deceased[]))
## Left outer join
m1 <- merge(books, authors, by.x = "name", by.y = "surname", all.x=TRUE, all.y=FALSE, trace = TRUE)
class(m1)
dim(m1)
names(books)
names(m1)
unique(paste(m1$name[], m1$nationality[]))
unique(paste(m1$name[], m1$deceased[]))
## Show coercion to allow NA's
authors$test <- ff(TRUE, length=nrow(authors), vmode = "boolean")
vmode(authors$test)
m1 <- merge(books, authors, by.x = "name", by.y = "surname", all.x=TRUE, all.y=FALSE, trace = TRUE)
vmode(m1$test)
table(m1$test[], exclude=c())
options(ffbatchbytes = oldffbatchbytes)

Run the code above in your browser using DataLab