Divides data into groups by a range of methods. Splits data by these groups.
splt(data, n, method = "n_dist", starts_col = NULL, force_equal = FALSE,
allow_zero = FALSE, descending = FALSE, randomize = FALSE,
remove_missing_starts = FALSE)Dataframe or Vector.
Dependent on method.
Number of groups (default), group size, list of group sizes,
list of group starts, step size or prime number to start at. See method.
Passed as whole number(s) and/or percentage(s) (0 < n < 1)
and/or character.
Method l_starts allows 'auto'.
greedy, n_dist, n_fill, n_last,
n_rand, l_sizes, l_starts, staircase, or
primes.
Notice: examples are sizes of the generated groups based on a vector with 57 elements.
n is group size
n is number of groups
n is number of groups
n is number of groups
n is number of groups
n is a list of group sizes
n is a list of starting positions.
Skip values by c(value, skip_to_number) where skip_to_number is the nth appearance of the value
in the vector.
Groups automatically start from first data point.
\(E.g. n = c(1,3,7,25,50) outputs groups with sizes (2,4,18,25,8)\).
To skip: \(given vector c("a", "e", "o", "a", "e", "o"), n = list("a", "e", c("o", 2)) outputs groups with sizes (1,4,1)\).
If passing \(n = 'auto'\) the starting positions are automatically found with
find_starts().
n is step size
n is the prime number to start at
Name of column with values to match in method l_starts
when data is a dataframe. Pass 'index' to use row names. (Character)
Create equal groups by discarding excess data points. Implementation varies between methods. (Logical)
Whether n can be passed as 0. (Logical)
Change direction of method. (Not fully implemented) (Logical)
Randomize the grouping factor (Logical)
Recursively remove elements from the
list of starts that are not found.
For method l_starts only.
(Logical)
List of splitted data
Other grouping functions: group_factor,
group
# NOT RUN {
# Attach packages
library(groupdata2)
library(dplyr)
# Create dataframe
df <- data.frame("x"=c(1:12),
"species" = rep(c('cat','pig', 'human'), 4),
"age" = sample(c(1:100), 12))
# Using splt()
df_list <- splt(df, 5, method = 'n_dist')
# }
Run the code above in your browser using DataLab