
tbl_cube(dimensions, measures)
tbl_cubes
are dense which means that almost everyselect
(M)summarise
(M), corresponds to roll-up, but rather more
limited since there are no hierarchies.filter
(D), corresponds to slice/dice.mutate
(M) is not implemented, but should be relatively
straightforward given the implementation ofsummarise
.arrange
(D?) Not implemented: not obvious how much sense
it would makeJoins: not implemented. See vignettes/joins.graffle
for ideas.
Probably straightforward if you get the indexes right, and that's probably
some straightforward array/tensor operation.
tbl_cube
support is currently experimental and little performance
optimisation has been done, but you may find them useful if your data
already comes in this form, or you struggle with the memory overhead of the
sparse/crossed of data frames. There is no supported for hierarchical
indices (although I think that would be a relatively straightforward
extension to storing data frames for indices rather than vectors).as.tbl_cube
for ways of coercing existing data
structures into a tbl_cube
.# The built in nasa dataset records meterological data (temperature,
# cloud cover, ozone etc) for a 4d spatio-temporal dataset (lat, long,
# month and year)
nasa
head(as.data.frame(nasa))
titanic <- as.tbl_cube(Titanic)
head(as.data.frame(titanic))
admit <- as.tbl_cube(UCBAdmissions)
head(as.data.frame(admit))
as.tbl_cube(esoph, dim_names = 1:3)
# Some manipulation examples with the NASA dataset --------------------------
# select() operates only on measures: it doesn't affect dimensions in any way
select(nasa, cloudhigh:cloudmid)
select(nasa, matches("temp"))
# filter() operates only on dimensions
filter(nasa, lat > 0, year == 2000)
# Each component can only refer to one dimensions, ensuring that you always
# create a rectangular subset
filter(nasa, lat > long)
# Arrange is meaningless for tbl_cubes
by_loc <- group_by(nasa, lat, long)
summarise(by_loc, pressure = max(pressure), temp = mean(temperature))
Run the code above in your browser using DataLab