Learn R Programming

fastplyr (version 0.5.1)

f_slice: Faster dplyr::slice()

Description

When there are lots of groups, the f_slice() functions are much faster.

Usage

f_slice(
  data,
  i = 0L,
  ...,
  .by = NULL,
  .order = df_group_by_order_default(data),
  keep_order = FALSE
)

f_slice_head( data, n, prop, .by = NULL, .order = df_group_by_order_default(data), keep_order = FALSE )

f_slice_tail( data, n, prop, .by = NULL, .order = df_group_by_order_default(data), keep_order = FALSE )

f_slice_min( data, order_by, n, prop, .by = NULL, with_ties = TRUE, na_rm = FALSE, .order = df_group_by_order_default(data), keep_order = FALSE )

f_slice_max( data, order_by, n, prop, .by = NULL, with_ties = TRUE, na_rm = FALSE, .order = df_group_by_order_default(data), keep_order = FALSE )

f_slice_sample( data, n, replace = FALSE, prop, .by = NULL, .order = df_group_by_order_default(data), keep_order = FALSE, weights = NULL )

Value

A data.frame filtered on the specified row indices.

Arguments

data

A data frame.

i

An integer vector of slice locations.
Please see the details below on how i works as it only accepts simple integer vectors.

...

A temporary argument to give the user an error if dots are used.

.by

(Optional). A selection of columns to group by for this operation. Columns are specified using tidy-select.

.order

Should the groups be returned in sorted order? If FALSE, this will return the groups in order of first appearance, and in many cases is faster.

keep_order

Should the sliced data frame be returned in its original order? The default is FALSE.

n

Number of rows.

prop

Proportion of rows.

order_by

Variables to order by.

with_ties

Should ties be kept together? The default is TRUE.

na_rm

Should missing values in f_slice_max() and f_slice_min() be removed? The default is FALSE.

replace

Should f_slice_sample() sample with or without replacement? Default is FALSE, without replacement.

weights

Probability weights used in f_slice_sample().

Details

Important note about the i argument in f_slice

i is first evaluated on an un-grouped basis and then searches for those locations in each group. Thus if you supply an expression of slice locations that vary by-group, this will not be respected nor checked. For example,
do f_slice(data, 10:20, .by = group)
not f_slice(data, sample(1:10), .by = group).

The former results in slice locations that do not vary by group but the latter will result in different within-group slice locations which f_slice cannot correctly compute.

To do the the latter type of by-group slicing, use f_filter, e.g.
f_filter(data, row_number() %in% slices, .by = groups) or even faster:
library(cheapr)
f_filter(data, row_number() %in_% slices, .by = groups)

f_slice_sample

The arguments of f_slice_sample() align more closely with base::sample() and thus by default re-samples each entire group without replacement.