
Last chance! 50% off unlimited learning
Sale ends in
Divides data into groups by a range of methods. Creates a grouping factor with 1s for group 1, 2s for group 2, etc. Returns a dataframe grouped by the grouping factor for easy use in dplyr pipelines.
group(data, n, method = "n_dist", starts_col = NULL, force_equal = FALSE,
allow_zero = FALSE, return_factor = FALSE, descending = FALSE,
randomize = FALSE, col_name = ".groups", remove_missing_starts = FALSE)
Dataframe or Vector.
Dependent on method.
Number of groups (default), group size, list of group sizes,
list of group starts, step size or prime number to start at. See method
.
Passed as whole number(s) and/or percentage(s) (0
< n
< 1
)
and/or character.
Method l_starts
allows 'auto'
.
greedy
, n_dist
, n_fill
, n_last
,
n_rand
, l_sizes
, l_starts
, staircase
, or
primes
.
Notice: examples are sizes of the generated groups based on a vector with 57 elements.
n
is group size
n
is number of groups
n
is number of groups
n
is number of groups
n
is number of groups
n
is a list of group sizes
n
is a list of starting positions.
Skip values by c(value, skip_to_number) where skip_to_number is the nth appearance of the value
in the vector.
Groups automatically start from first data point.
To skip:
If passing find_starts()
.
n
is step size
n
is the prime number to start at
Name of column with values to match in method l_starts
when data is a dataframe. Pass 'index' to use row names. (Character)
Create equal groups by discarding excess data points. Implementation varies between methods. (Logical)
Whether n can be passed as 0
. (Logical)
Return only grouping factor (Logical)
Change direction of method. (Not fully implemented) (Logical)
Randomize the grouping factor (Logical)
Name of added grouping factor
Recursively remove elements from the
list of starts that are not found.
For method l_starts
only.
(Logical)
Dataframe grouped by new grouping factor
Other grouping functions: group_factor
,
splt
Other staircase tools: %primes%
,
%staircase%
, group_factor
Other l_starts tools: find_missing_starts
,
find_starts
, group_factor
# NOT RUN {
# Attach packages
library(groupdata2)
library(dplyr)
# Create dataframe
df <- data.frame("x"=c(1:12),
"species" = rep(c('cat','pig', 'human'), 4),
"age" = sample(c(1:100), 12))
# Using group()
df_grouped <- group(df, 5, method = 'n_dist')
# Using group() with dplyr pipeline to get mean age
df_means <- df %>%
group(5, method = 'n_dist') %>%
dplyr::summarise(mean_age = mean(age))
# Using group_factor() with l_starts
# "c('pig',2)" skips to the second appearance of
# "pig" after the first appearance of "cat"
df_grouped <- group(df,
list('cat', c('pig',2), 'human'),
method = 'l_starts',
starts_col = 'species')
# }
Run the code above in your browser using DataLab