
Last chance! 50% off unlimited learning
Sale ends in
Recode numeric variables into equal sized groups, i.e. a
variable is cut into a smaller number of groups at specific cut points.
split_var_if()
is a scoped variant of split_var()
, where
transformation will be applied only to those variables that match the
logical condition of predicate
.
split_var(
x,
...,
n,
as.num = FALSE,
val.labels = NULL,
var.label = NULL,
inclusive = FALSE,
append = TRUE,
suffix = "_g"
)split_var_if(
x,
predicate,
n,
as.num = FALSE,
val.labels = NULL,
var.label = NULL,
inclusive = FALSE,
append = TRUE,
suffix = "_g"
)
A grouped variable with equal sized groups. If x
is a data frame,
for append = TRUE
, x
including the grouped variables as new
columns is returned; if append = FALSE
, only the grouped variables
will be returned. If append = TRUE
and suffix = ""
,
recoded variables will replace (overwrite) existing variables.
A vector or data frame.
Optional, unquoted names of variables that should be selected for
further processing. Required, if x
is a data frame (and no
vector) and only selected variables from x
should be processed.
You may also use functions like :
or tidyselect's
select-helpers.
See 'Examples' or package-vignette.
The new number of groups that x
should be split into.
Logical, if TRUE
, return value will be numeric, not a factor.
Optional character vector, to set value label attributes
of recoded variable (see vignette Labelled Data and the sjlabelled-Package).
If NULL
(default), no value labels will be set. Value labels
can also be directly defined in the rec
-syntax, see
'Details'.
Optional string, to set variable label attribute for the
returned variable (see vignette Labelled Data and the sjlabelled-Package).
If NULL
(default), variable label attribute of x
will
be used (if present). If empty, variable label attributes will be removed.
Logical; if TRUE
, cut point value are included in
the preceding group. This may be necessary if cutting a vector into
groups does not define proper ("equal sized") group sizes.
See 'Note' and 'Examples'.
Logical, if TRUE
(the default) and x
is a data frame,
x
including the new variables as additional columns is returned;
if FALSE
, only the new variables are returned.
Indicates which suffix will be added to each dummy variable.
Use "numeric"
to number dummy variables, e.g. x_1,
x_2, x_3 etc. Use "label"
to add value label,
e.g. x_low, x_mid, x_high. May be abbreviated.
A predicate function to be applied to the columns. The
variables for which predicate
returns TRUE
are selected.
split_var()
splits a variable into equal sized groups, where
the amount of groups depends on the n
-argument. Thus, this
functions cuts
a variable into groups at the specified
quantiles
.
By contrast, group_var
recodes a variable into groups, where
groups have the same value range (e.g., from 1-5, 6-10, 11-15 etc.).
split_var()
also works on grouped data frames
(see group_by
). In this case, splitting is applied to
the subsets of variables in x
. See 'Examples'.
group_var
to group variables into equal ranged groups,
or rec
to recode variables.
data(efc)
# non-grouped
table(efc$neg_c_7)
# split into 3 groups
table(split_var(efc$neg_c_7, n = 3))
# split multiple variables into 3 groups
split_var(efc, neg_c_7, pos_v_4, e17age, n = 3, append = FALSE)
frq(split_var(efc, neg_c_7, pos_v_4, e17age, n = 3, append = FALSE))
# original
table(efc$e42dep)
# two groups, non-inclusive cut-point
# vector split leads to unequal group sizes
table(split_var(efc$e42dep, n = 2))
# two groups, inclusive cut-point
# group sizes are equal
table(split_var(efc$e42dep, n = 2, inclusive = TRUE))
# Unlike dplyr's ntile(), split_var() never splits a value
# into two different categories, i.e. you always get a clean
# separation of original categories
library(dplyr)
x <- dplyr::ntile(efc$neg_c_7, n = 3)
table(efc$neg_c_7, x)
x <- split_var(efc$neg_c_7, n = 3)
table(efc$neg_c_7, x)
# works also with gouped data frames
mtcars %>%
split_var(disp, n = 3, append = FALSE) %>%
table()
mtcars %>%
group_by(cyl) %>%
split_var(disp, n = 3, append = FALSE) %>%
table()
Run the code above in your browser using DataLab