Condenses a dataset
by aggregating the data to a given (shorter) interval
unit
. aggregate_Datetime()
is opinionated in the sense that it sets
default handlers for each data type of numeric
, character
, logical
,
factor
, duration
, time
, and datetime
. These can be overwritten by the
user. Columns that do not fall into one of these categories need to be
handled individually by the user (...
argument) or will be removed during
aggregation. If no unit is specified the data will simply be aggregated to
the most common interval (dominant.epoch
), which is most often not an
aggregation but a rounding.)
aggregate_Datetime(
dataset,
unit = "dominant.epoch",
Datetime.colname = Datetime,
type = c("round", "floor", "ceiling"),
numeric.handler = mean,
character.handler = function(x) names(which.max(table(x, useNA = "ifany"))),
logical.handler = function(x) mean(x) >= 0.5,
factor.handler = function(x) factor(names(which.max(table(x, useNA = "ifany")))),
datetime.handler = mean,
duration.handler = function(x) lubridate::duration(mean(x)),
time.handler = function(x) hms::as_hms(mean(x)),
...
)
A tibble
with aggregated Datetime
data. Usually the number of
rows will be smaller than the input dataset
. If the handler arguments
capture all column types, the number of columns will be the same as in the
input dataset
.
A light logger dataset. Expects a dataframe
. If not imported
by LightLogR, take care to choose a sensible variable for the
Datetime.colname
.
Unit of binning. See lubridate::round_date()
for examples. The
default is "dominant.epoch"
, which means everything will be aggregated to
the most common interval. This is especially useful for slightly irregular
data, but can be computationally expensive. "none"
will not aggregate the
data at all.
column name that contains the datetime. Defaults to
"Datetime"
which is automatically correct for data imported with
LightLogR. Expects a symbol
. Needs to be part of the dataset
. Must
be of type POSIXct
.
One of "round"
(the default), "ceiling"
or "floor"
. Setting
chooses the relevant function from lubridate.
functions that handle the respective data types. The default handlers
calculate the mean
or median
for numeric
, POSIXct
, duration
, and
hms
, and the mode
for character
, factor
and logical
types.
arguments given over to dplyr::summarize()
to handle columns
that do not fall into one of the categories above.
Summary values for type POSIXct
are calculated as the mean, which can be
nonsensical at times (e.g., the mean of Day1 18:00 and Day2 18:00, is Day2
6:00, which can be the desired result, but if the focus is on time, rather
then on datetime, it is recommended that values are converted to times via
hms::as_hms()
before applying the function (the mean of 18:00 and 18:00 is
still 18:00, not 6:00).
#dominant epoch without aggregation
sample.data.environment %>%
dominant_epoch()
#dominant epoch with 5 minute aggregation
sample.data.environment %>%
aggregate_Datetime(unit = "5 mins") %>%
dominant_epoch()
#dominant epoch with 1 day aggregation
sample.data.environment %>%
aggregate_Datetime(unit = "1 day") %>%
dominant_epoch()
Run the code above in your browser using DataLab