Learn R Programming

scidb (version 1.2-0)

merge-methods: Methods for Function merge in Package scidb

Description

SciDB merge, cross_join, and join operations.

Usage

"merge"(x,y, by=intersect(dimensions(x),dimensions(y)), by.x, by.y, merge, all) "merge"(x,y, by=intersect(dimensions(x),dimensions(y)), by.x, by.y, merge, all)

Arguments

x
A scidb or scidbdf
y
A scidb or scidbdf
by
(Optional) Vector of common dimension or attribute names or dimension indices to join on. See details below.
by.x
(Optional) Vector of dimension or attribute names or dimension indices of array x to join on. See deails.
by.y
(Optional) Vector of dimension or attribute names or dimension indices of array y to join on. See deails.
merge
(Optional) If TRUE, perform a SciDB merge operation instead of join.
all
(Optional) If TRUE, perform outer join. Defaults to inner join.

Value

scidb or scidbdf reference object.

Details

Only one of either by or both by.x and by.y may be specified. If none of the by.x,by.y arguments are specified, and by=NULL the result is the Cartesian cross product of x and y. The default value of by performs a cross_join or join along common array dimensions. The by arguments may be specified by name or 1-based integer dimension index.

If only by is specified, the dimension names or attribute name in by are assumed to be common across x and y. Otherwise dimension names or attribute names are matched across the names listed in by.x and by.y, respectively.

If dimension names are specified and by contains all the dimensions in each array, then the SciDB join operator is used, otherwise SciDB's cross_join operator is used. In each either case, the output is a cross product set of the two arrays along the specified dimensions.

If by or each of by.x and by.y list a single attribute name, the indicated attributes will be lexicographically ordered as categorical variables and SciDB will redimension each array along new coordinate systems defined by the attributes, and then those redimensioned arrays will be joined. This method limits joins along attributes to a single attribute from each array. The output array will contain additional columns showing the attribute factor levels used to join the arrays.

Specify merge=TRUE to perform a SciDB merge operation instead of a SciDB join.

If all=FALSE (the default), then a SQL-like `natural join` (an inner join) is performed. If all=TRUE then SQL-like `outer join` is performed, but this case has some limitiations; in particular the outer join is not available yet for the merge=TRUE case, for joining on SciDB attributes, or for joining on subsets of dimensions.

The various SciDB join operators generally require that the arrays have identical partitioning (coordinate system bounds, chunk size, etc.) in the common dimensions. The merge method attempts to rectify SciDB arrays along the specified dimensions as required before joining. Those dimensions must at least have common lower index bounds.

The merge function may rename SciDB attributes and dimensions as required to avoid name conflicts in SciDB. See the last example for an illustration.

Examples

Run this code
## Not run: 
# # Create a copy of the iris data frame in a 1-d SciDB array named "iris."
# # Note that SciDB attribute names will be changed to conform to SciDB
# # naming convention.
# x <- as.scidb(iris,name="iris")
# 
# a <- x$Species
# b <- x$Petal_Length
# 
# c <- merge(a, b, by="row")
# merge(b, b, by="row", merge=TRUE)
# 
# 
# # Here is an example that joins on SciDB array attributes instead of
# # dimensions. It works by enumerating the attribute values and
# # redimensioning along those.
# set.seed(1)
# a <- as.scidb(data.frame(a=sample(10,5),b=rnorm(5)))
# b <- as.scidb(data.frame(u=sample(10,5),v=rnorm(5)))
# merge(x=a, y=b, by.x="a", by.y="u")[]
# 
# 
# # The following example joins on a subset of coordinate axes:
# x <- build(5.5, c(3,3));                  print(schema(x))
# y <- build(1.1, c(3,3),chunksize=c(2,1)); print(schema(y))
# z <- merge(x, y, by="i")
# print(schema(z))
# 
# ## End(Not run)

Run the code above in your browser using DataLab