deduplicate()
flags, drops or aggregates duplicates, which are defined as
consecutive visits to the same URL within a certain time frame.
deduplicate(
wt,
method = "aggregate",
within = 1,
duration_var = "duration",
keep_nvisits = FALSE,
same_day = TRUE
)
webtrack data.table with the same columns as wt with updated duration
webtrack data object.
character. One of "aggregate"
, "flag"
or "drop"
.
If set to "aggregate"
, consecutive visits (no matter the time difference)
to the same URL are combined and their duration aggregated.
In this case, a duration column must be specified via "duration_var"
.
If set to "flag"
, duplicates within a certain time frame are flagged in a new
column called duplicate
. In this case, within
argument must be specified.
If set to "drop"
, duplicates are dropped. Again, within
argument must be specified.
Defaults to "aggregate"
.
numeric (seconds). If method
set to "flag"
or "drop"
,
a subsequent visit is only defined as a duplicate when happening within
this time difference. Defaults to 1 second.
character. Name of duration variable. Defaults to "duration"
.
boolean. If method set to "aggregate"
, this determines whether
number of aggregated visits should be kept as variable. Defaults to FALSE
.
boolean. If method set to "aggregate"
, determines
whether to count visits as consecutive only when on the same day. Defaults to TRUE
.
if (FALSE) {
data("testdt_tracking")
wt <- as.wt_dt(testdt_tracking)
wt <- add_duration(wt, cutoff = 300, replace_by = 300)
# Dropping duplicates with one-second default
wt_dedup <- deduplicate(wt, method = "drop")
# Flagging duplicates with one-second default
wt_dedup <- deduplicate(wt, method = "flag")
# Aggregating duplicates
wt_dedup <- deduplicate(wt[1:1000], method = "aggregate")
# Aggregating duplicates and keeping number of visits for aggregated visits
wt_dedup <- deduplicate(wt[1:1000], method = "aggregate", keep_nvisits = TRUE)
}
Run the code above in your browser using DataLab