Recode numeric variables into equal ranged, grouped factors,
i.e. a variable is cut into a smaller number of groups, where each group
has the same value range. group_labels()
creates the related value
labels. group_var_if()
and group_labels_if()
are scoped
variants of group_var()
and group_labels()
, where grouping
will be applied only to those variables that match the logical condition
of predicate
.
group_var(
x,
...,
size = 5,
as.num = TRUE,
right.interval = FALSE,
n = 30,
append = TRUE,
suffix = "_gr"
)group_var_if(
x,
predicate,
size = 5,
as.num = TRUE,
right.interval = FALSE,
n = 30,
append = TRUE,
suffix = "_gr"
)
group_labels(x, ..., size = 5, right.interval = FALSE, n = 30)
group_labels_if(x, predicate, size = 5, right.interval = FALSE, n = 30)
For group_var()
, a grouped variable, either as numeric or as factor (see paramter as.num
). If x
is a data frame, only the grouped variables will be returned.
For group_labels()
, a string vector or a list of string vectors containing labels based on the grouped categories of x
, formatted as "from lower bound to upper bound", e.g. "10-19" "20-29" "30-39"
etc. See 'Examples'.
A vector or data frame.
Optional, unquoted names of variables that should be selected for
further processing. Required, if x
is a data frame (and no
vector) and only selected variables from x
should be processed.
You may also use functions like :
or tidyselect's
select-helpers.
See 'Examples' or package-vignette.
Numeric; group-size, i.e. the range for grouping. By default,
for each 5 categories of x
a new group is defined, i.e. size = 5
.
Use size = "auto"
to automatically resize a variable into a maximum
of 30 groups (which is the ggplot-default grouping when plotting
histograms). Use n
to determine the amount of groups.
Logical, if TRUE
, return value will be numeric, not a factor.
Logical; if TRUE
, grouping starts with the lower
bound of size
. See 'Details'.
Sets the maximum number of groups that are defined when auto-grouping is on
(size = "auto"
). Default is 30. If size
is not set to "auto"
,
this argument will be ignored.
Logical, if TRUE
(the default) and x
is a data frame,
x
including the new variables as additional columns is returned;
if FALSE
, only the new variables are returned.
Indicates which suffix will be added to each dummy variable.
Use "numeric"
to number dummy variables, e.g. x_1,
x_2, x_3 etc. Use "label"
to add value label,
e.g. x_low, x_mid, x_high. May be abbreviated.
A predicate function to be applied to the columns. The
variables for which predicate
returns TRUE
are selected.
If size
is set to a specific value, the variable is recoded
into several groups, where each group has a maximum range of size
.
Hence, the amount of groups differ depending on the range of x
.
If size = "auto"
, the variable is recoded into a maximum of
n
groups. Hence, independent from the range of
x
, always the same amount of groups are created, so the range
within each group differs (depending on x
's range).
right.interval
determins which boundary values to include when
grouping is done. If TRUE
, grouping starts with the lower
bound of size
. For example, having a variable ranging from
50 to 80, groups cover the ranges from 50-54, 55-59, 60-64 etc.
If FALSE
(default), grouping starts with the upper bound
of size
. In this case, groups cover the ranges from
46-50, 51-55, 56-60, 61-65 etc. Note: This will cover
a range from 46-50 as first group, even if values from 46 to 49
are not present. See 'Examples'.
If you want to split a variable into a certain amount of equal
sized groups (instead of having groups where values have all the same
range), use the split_var
function!
group_var()
also works on grouped data frames (see group_by
).
In this case, grouping is applied to the subsets of variables
in x
. See 'Examples'.
split_var
to split variables into equal sized groups,
group_str
for grouping string vectors or
rec_pattern
and rec
for another convenient
way of recoding variables into smaller groups.
age <- abs(round(rnorm(100, 65, 20)))
age.grp <- group_var(age, size = 10)
hist(age)
hist(age.grp)
age.grpvar <- group_labels(age, size = 10)
table(age.grp)
print(age.grpvar)
# histogram with EUROFAMCARE sample dataset
# variable not grouped
library(sjlabelled)
data(efc)
hist(efc$e17age, main = get_label(efc$e17age))
# bar plot with EUROFAMCARE sample dataset
# grouped variable
ageGrp <- group_var(efc$e17age)
ageGrpLab <- group_labels(efc$e17age)
barplot(table(ageGrp), main = get_label(efc$e17age), names.arg = ageGrpLab)
# within a pipe-chain
library(dplyr)
efc %>%
select(e17age, c12hour, c160age) %>%
group_var(size = 20)
# create vector with values from 50 to 80
dummy <- round(runif(200, 50, 80))
# labels with grouping starting at lower bound
group_labels(dummy)
# labels with grouping startint at upper bound
group_labels(dummy, right.interval = TRUE)
# works also with gouped data frames
mtcars %>%
group_var(disp, size = 4, append = FALSE) %>%
table()
mtcars %>%
group_by(cyl) %>%
group_var(disp, size = 4, append = FALSE) %>%
table()
Run the code above in your browser using DataLab