druid.query.timeseries: Query time series data

Description

Queries druid for timeseries data and returns it as a data frame

Usage

druid.query.timeseries(url = druid.url(), dataSource, intervals, aggregations, filter = NULL, granularity = "all", postAggregations = NULL, context = NULL, rawData = FALSE, verbose = F, ...)

Arguments

url

URL to connect to druid, defaults to druid.url()

dataSource

name of the data source to query

intervals

time period to retrieve data for as an interval object or list of interval objects

aggregations

list of metric aggregations to compute for this datasource

filter

filter specifying the subset of the data to extract.

granularity

time granularity at which to aggregate

postAggregations

post-aggregations to perform on the aggregations

context

query context

rawData

if set, returns the result object as is, without converting to a data frame

verbose

prints out the JSON query sent to druid

...

additional parameters to pass to druid.resulttodf

Value

Returns a data frame where each column represents a time series

Examples

Run this code

## Not run: 
# 
#    # Get the time series associated with the twitter hashtag #druid, by hour
#    druid.query.timeseries(url = druid.url(host = "<hostname>"),
#                          dataSource   = "twitter",
#                          intervals    = interval(ymd("2012-07-01"), ymd("2012-07-15")),
#                          aggregations = sum(metric("count")),
#                          filter       = dimension("hashtag") == "druid",
#                          granularity  = granularity("hour"))
# 
#    # Average tweet length for a combination of hashtags in a given time zone
#    druid.query.timeseries(url = druid.url("<hostname>"),
#                          dataSource   = "twitter",
#                          intervals    = interval(ymd("2012-07-01"), ymd("2012-08-30")),
#                          aggregations = list(
#                                            sum(metric("count")),
#                                            sum(metric("length")
#                                         ),
#                          postAggregations = list(
#                                            avg_length = field("length") / field("count")
#                                         )
#                          filter       =   dimension("hashtag") == "london2012"
#                                         | dimension("hashtag") == "olympics",
#                          granularity  = granularity("PT6H", timeZone="Europe/London"))
#   ## End(Not run)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples