Learn R Programming

scidb (version 1.1-2)

merge-methods: Methods for Function merge in Package scidb

Description

SciDB merge, cross_join, and join operations.

Usage

## S3 method for class 'scidb':
merge(x,y, by=intersect(dimensions(x),dimensions(y)), by.x, by.y, merge, eval)
## S3 method for class 'scidbdf':
merge(x,y, by=intersect(dimensions(x),dimensions(y)), by.x, by.y, merge, eval)

Arguments

x
A scidb or scidbdf
y
A scidb or scidbdf
by
(Optional) Vector of common dimension or attribute names to join on. See details below.
by.x
(Optional) Vector of dimension or attribute names of array x to join on. See deails.
by.y
(Optional) Vector of dimension or attribute names of array y to join on. See deails.
merge
(Optional) If true, perform a SciDB merge operation instead of join.
eval
(Optional) If true, execute the query and store the reult array. Otherwise defer evaluation.

Value

  • A scidb or scidbdf reference object.

Details

Only one of either by or both by.x and by.y may be specified. If none of the by.x,by.y arguments are specified, and by=NULL the result is the Cartesian product of x and y. The default value of by performs a cross_join or join along common array dimensions.

If only by is specified, the dimension names or attribute name in by are assumed to be common across x and y. Otherwise dimension names or attribute names are matched across the names listed in by.x and by.y, respectively.

If dimension names are specified and by contains all the dimensions in each array, then the SciDB join operator is used, otherwise SciDB's cross_join operator is used. In each either case, the output is a cross product set of the two arrays along the specified dimensions.

If by or each of by.x and by.y list a single dimension name, the indicated attributes will be lexicographically ordered as categorical variables and SciDB will redimension each array along new coordinate systems defined by the attributes, and then those redimensioned arrays will be joined. This method limits joins along attributes to a single attribute from each array. The output array will contain additional columns showing the attribute factor levels used to join the arrays.

This method is limited to SQL-like `natural joins`, a special case of inner joins corresponding to the all=FALSE case in the standard R merge function. A future version of this package will include additional join cases.

Specify merge=TRUE to perform a SciDB merge operation instead of a SciDB join.

The various SciDB join operators generally require that the arrays have identical partitioning (coordinate system bounds, chunk size, etc.) in the common dimensions. The merge method attempts to rectify SciDB arrays along the specified dimensions as required before joining.

Examples

Run this code
# Create a copy of the iris data frame in a 1-d SciDB array named "iris."
# Note that SciDB attribute names will be changed to conform to SciDB
# naming convention.
x <- as.scidb(iris,name="iris")

a <- x$Species
b <- x$Petal_Length

c <- merge(a, b, by="row")
merge(b, b, by="row", merge=TRUE)

Run the code above in your browser using DataLab