Learn R Programming

fastplyr (version 0.2.0)

f_slice: Faster dplyr::slice()

Description

When there are lots of groups, the f_slice() functions are much faster.

Usage

f_slice(data, i = 0L, ..., .by = NULL, keep_order = FALSE)

f_slice_head(data, n, prop, .by = NULL, keep_order = FALSE)

f_slice_tail(data, n, prop, .by = NULL, keep_order = FALSE)

f_slice_min( data, order_by, n, prop, .by = NULL, with_ties = TRUE, na_rm = FALSE, keep_order = FALSE )

f_slice_max( data, order_by, n, prop, .by = NULL, with_ties = TRUE, na_rm = FALSE, keep_order = FALSE )

f_slice_sample( data, n, replace = FALSE, prop, .by = NULL, keep_order = FALSE, weights = NULL, seed = NULL )

Value

A data.frame filtered on the specified row indices.

Arguments

data

A data frame.

i

An integer vector of slice locations.
Please see the details below on how i works as it only accepts simple integer vectors.

...

A temporary argument to give the user an error if dots are used.

.by

(Optional). A selection of columns to group by for this operation. Columns are specified using tidy-select.

keep_order

Should the sliced data frame be returned in its original order? The default is FALSE.

n

Number of rows.

prop

Proportion of rows.

order_by

Variables to order by.

with_ties

Should ties be kept together? The default is TRUE.

na_rm

Should missing values in f_slice_max() and f_slice_min() be removed? The default is FALSE.

replace

Should f_slice_sample() sample with or without replacement? Default is FALSE, without replacement.

weights

Probability weights used in f_slice_sample().

seed

Seed number defining RNG state. If supplied, this is only applied locally within the function and the seed state isn't retained after sampling. To clarify, whatever seed state was in place before the function call, is restored to ensure seed continuity. If left NULL (the default), then the seed is never modified.

Details

Important note about the i argument in f_slice

i is first evaluated on an un-grouped basis and then searches for those locations in each group. Thus if you supply an expression of slice locations that vary by-group, this will not be respected nor checked. For example,
do f_slice(data, 10:20, .by = group)
not f_slice(data, sample(1:10), .by = group).

The former results in slice locations that do not vary by group but the latter will result in different within-group slice locations which f_slice cannot correctly compute.

To do the the latter type of by-group slicing, use f_filter, e.g.
f_filter(data, row_number() %in% slices, .by = groups) or even faster:
library(cheapr)
f_filter(data, row_number() %in_% slices, .by = groups)

f_slice_sample

The arguments of f_slice_sample() align more closely with base::sample() and thus by default re-samples each entire group without replacement.