dplyr::slice()
When there are lots of groups, the f_slice()
functions are much faster.
f_slice(
data,
i = 0L,
...,
.by = NULL,
.order = df_group_by_order_default(data),
keep_order = FALSE
)f_slice_head(
data,
n,
prop,
.by = NULL,
.order = df_group_by_order_default(data),
keep_order = FALSE
)
f_slice_tail(
data,
n,
prop,
.by = NULL,
.order = df_group_by_order_default(data),
keep_order = FALSE
)
f_slice_min(
data,
order_by,
n,
prop,
.by = NULL,
with_ties = TRUE,
na_rm = FALSE,
.order = df_group_by_order_default(data),
keep_order = FALSE
)
f_slice_max(
data,
order_by,
n,
prop,
.by = NULL,
with_ties = TRUE,
na_rm = FALSE,
.order = df_group_by_order_default(data),
keep_order = FALSE
)
f_slice_sample(
data,
n,
replace = FALSE,
prop,
.by = NULL,
.order = df_group_by_order_default(data),
keep_order = FALSE,
weights = NULL
)
A data.frame
filtered on the specified row indices.
A data frame.
An integer vector of slice locations.
Please see the details below on how i
works as it
only accepts simple integer vectors.
A temporary argument to give the user an error if dots are used.
(Optional). A selection of columns to group by for this operation. Columns are specified using tidy-select.
Should the groups be returned in sorted order?
If FALSE
, this will return the groups in order of first appearance,
and in many cases is faster.
Should the sliced data frame be returned in its original order?
The default is FALSE
.
Number of rows.
Proportion of rows.
Variables to order by.
Should ties be kept together? The default is TRUE
.
Should missing values in f_slice_max()
and f_slice_min()
be removed?
The default is FALSE
.
Should f_slice_sample()
sample with or without replacement?
Default is FALSE
, without replacement.
Probability weights used in f_slice_sample()
.
i
argument in f_slice
i
is first evaluated on an un-grouped basis and then searches for
those locations in each group. Thus if you supply an expression
of slice locations that vary by-group, this will not be respected nor checked.
For example,
do f_slice(data, 10:20, .by = group)
not f_slice(data, sample(1:10), .by = group)
.
The former results in slice locations that do not vary by group but the latter
will result in different within-group slice locations which f_slice
cannot
correctly compute.
To do the the latter type of by-group slicing, use f_filter
, e.g.
f_filter(data, row_number() %in% slices, .by = groups)
or even faster:
library(cheapr)
f_filter(data, row_number() %in_% slices, .by = groups)
f_slice_sample
The arguments of f_slice_sample()
align more closely with base::sample()
and thus
by default re-samples each entire group without replacement.