Learn R Programming

scidb (version 1.1-2)

aggregate-methods: Methods for Function aggregate in Package scidb

Description

Aggregate a SciDB array object grouped by a subset of its dimensions and/or attributes.

Usage

## S3 method for class 'scidb':
aggregate(x, by, FUN, eval, window, variable_window, unpack)
## S3 method for class 'scidbdf':
aggregate(x, by, FUN, eval, window, variable_window, unpack)

Arguments

x
A scidb or scidbdf object.
by
(Optional) Either a single character string or a list of array dimension and/or attribute names to group by; or a SciDB array reference object to group by. Not required for windowed and grand aggregates--see details.
FUN
A character string representing a SciDB aggregation expression or a reduction function.
eval
(Optional) If true, execute the query and store the reult array. Otherwise defer evaluation.
window
(Optional) If specified, perform a moving window aggregate along the specified coordinate windows--see details below.
variable_window
(Optional) If specified, perform a moving window aggregate over successive data values along the coordinate dimension axis specified by by--see details below.
unpack
(Optional) If TRUE, return an unpacked SciDB result as a scidbdf dataframe-like object. It's sometimes useful to set this to FALSE if the aggregated result needs to be joined with another array. Default=F

Value

  • A scidbdf reference object.

Details

Group the scidb, or scidbdf array object x by dimensions and/or attributes in the array. applying the valid SciDB aggregation function FUN expressed as a character string to the groups. Set eval to TRUE to execute the aggregation and return a scidb object; set eval to FALSE to return an unevaluated SciDB array promise, which is essentially a character string describing the query that can be composed with other SciDB package functions.

If an R reduction function is speciied for FUN, it will be transliterated to a SciDB aggregate.

The by argument must be a list of dimension names and/or attribute names in the array x to group by, or a SciDB array reference object. If by is not specified and one of the window options is not specified, then a grand aggregate is performed over all values in the array.

The argument by may be a list of dimension names and/or attributes of the array x. Attributes that are not of type int64 will be `factorized` first and replaced by enumerated int64 values that indicate each unique level (this requires SciDB 13.6 or higher).

When by is a SciDB array it must contain one or more common dimensions with x. The two arrays will be joined (using SciDB cross_join(x,by) and the resulting array will be grouped by the attributes in the by array. This is similar to the usual R data.frame aggregate method.

Perform moving window aggregates by specifying the optional window or variable_window arguments. Use window to compute the aggregate expression along a moving window specified along each coordinate axis as window=c(dimension_1_low, dim_1_high, dim_2_low,_dim_2_high, .... Moving window aggregates along coordinates may be applied in multiple dimensions.

Use variable_window to perform moving window aggregates over data values in a single dimension specified by the by argument. See below for examples. Moving window aggregates along data values are restricted to a single array dimension.

Examples

Run this code
# Create a copy of the iris data frame in a 1-d SciDB array named "iris."
# Note that SciDB attribute names will be changed to conform to SciDB
# naming convention.
x <- as.scidb(iris,name="iris")

# Compute averages of each variable grouped by Species
a <- aggregate(x, by="Species", FUN=mean)

# Aggregation by an auxillary vector (which in this example comes from
# an R data frame)--also note any valid SciDB aggregation expression may
# be used:
y <- as.scidb(data.frame(sample(1:4,150,replace=TRUE)))
a <- aggregate(x, by=y, FUN="avg(Petal_Width) as apw, min(Sepal_Length) as msl")

# Use the window argument to perform moving window aggregates along coordinate
# systems. You need to supply a window across all the array dimesions.
set.seed(1)
A <- as.scidb(matrix(rnorm(20),nrow=5))
# Compute a moving window aggregate only along the rows summing two rows at
# a time (returning result to R). The notation (0,1,0,0) means apply the
# aggregate over the current row (0) and (1) following row, and just over
# the current column (that is, a window size of one).
aggregate(A,FUN="sum(val)",window=c(0,1,0,0))[]
# The above aggregate is equivalent to, for example:
apply(a,2,function(x) x+c(x[-1],0))

# Moving windows using the window= argument run along array coordinates.
# Moving windows using the variable_window= argument run along data values,
# skipping over empty array cells. The next example illustrates the
# difference.

# First, create an array with empty values:
B <- A>0
# Here is what B looks like:
B[]
# Now, run a moving window aggregate along the rows with window just like
# the above example:
aggregate(B,FUN="sum(val)",window=c(0,1,0,0))[]
# And now, a moving window along only the data values down the rows, note
# that we need to specify the dimension with by=:
aggregate(B,by="i",FUN="sum(val)",variable_window=c(0,1))[]

Run the code above in your browser using DataLab