f_slice: Faster `dplyr::slice()`

Description

When there are lots of groups, the f_slice() functions are much faster.

Usage

f_slice(
  data,
  i = 0L,
  ...,
  .by = NULL,
  .order = df_group_by_order_default(data),
  keep_order = FALSE
)
f_slice_head(
  data,
  n,
  prop,
  .by = NULL,
  .order = df_group_by_order_default(data),
  keep_order = FALSE
)
f_slice_tail(
  data,
  n,
  prop,
  .by = NULL,
  .order = df_group_by_order_default(data),
  keep_order = FALSE
)
f_slice_min(
  data,
  order_by,
  n,
  prop,
  .by = NULL,
  with_ties = TRUE,
  na_rm = FALSE,
  .order = df_group_by_order_default(data),
  keep_order = FALSE
)
f_slice_max(
  data,
  order_by,
  n,
  prop,
  .by = NULL,
  with_ties = TRUE,
  na_rm = FALSE,
  .order = df_group_by_order_default(data),
  keep_order = FALSE
)
f_slice_sample(
  data,
  n,
  replace = FALSE,
  prop,
  .by = NULL,
  .order = df_group_by_order_default(data),
  keep_order = FALSE,
  weights = NULL
)

Value

A data.frame filtered on the specified row indices.

Arguments

data: A data frame.
i: An integer vector of slice locations.
Please see the details below on how i works as it only accepts simple integer vectors.
...: A temporary argument to give the user an error if dots are used.
.by: (Optional). A selection of columns to group by for this operation. Columns are specified using tidy-select.
.order: Should the groups be returned in sorted order? If FALSE, this will return the groups in order of first appearance, and in many cases is faster.
keep_order: Should the sliced data frame be returned in its original order? The default is FALSE.
n: Number of rows.
prop: Proportion of rows.
order_by: Variables to order by.
with_ties: Should ties be kept together? The default is TRUE.
na_rm: Should missing values in f_slice_max() and f_slice_min() be removed? The default is FALSE.
replace: Should f_slice_sample() sample with or without replacement? Default is FALSE, without replacement.
weights: Probability weights used in f_slice_sample().

Details

Important note about the `i` argument in `f_slice`

i is first evaluated on an un-grouped basis and then searches for those locations in each group. Thus if you supply an expression of slice locations that vary by-group, this will not be respected nor checked. For example,
do f_slice(data, 10:20, .by = group)
not f_slice(data, sample(1:10), .by = group).

The former results in slice locations that do not vary by group but the latter will result in different within-group slice locations which f_slice cannot correctly compute.

To do the the latter type of by-group slicing, use f_filter, e.g.
f_filter(data, row_number() %in% slices, .by = groups) or even faster:
library(cheapr)
f_filter(data, row_number() %in_% slices, .by = groups)

`f_slice_sample`

The arguments of f_slice_sample() align more closely with base::sample() and thus by default re-samples each entire group without replacement.