Dichotomizes variables into dummy variables (0/1). Dichotomization is
either done by median, mean or a specific value (see dich.by
).
dicho(x, ..., dich.by = "median", as.num = FALSE, var.label = NULL,
val.labels = NULL, append = FALSE, suffix = "_d")
A vector or data frame.
Optional, unquoted names of variables that should be selected for
further processing. Required, if x
is a data frame (and no
vector) and only selected variables from x
should be processed.
You may also use functions like :
or dplyr's select_helpers
.
See 'Examples' or package-vignette.
Indicates the split criterion where a variable is dichotomized. Must be one of the following values (may be abbreviated):
"median"
or "md"
by default, x
is split into two groups at the median.
"mean"
or "m"
splits x
into two groups at the mean of x
.
splits x
into two groups at the specific value. Note that the value is inclusive, i.e. dich.by = 10
will split x
into one group with values from lowest to 10 and another group with values greater than 10.
Logical, if TRUE
, return value will be numeric, not a factor.
Optional string, to set variable label attribute for the
returned variable (see vignette Labelled Data and the sjlabelled-Package).
If NULL
(default), variable label attribute of x
will
be used (if present). If empty, variable label attributes will be removed.
Optional character vector (of length two), to set value label
attributes of dichotomized variable (see set_labels
).
If NULL
(default), no value labels will be set.
Logical, if TRUE
and x
is a data frame,
x
including the new variables as additional columns is returned;
if FALSE
(the default), only the new variables are returned.
String value, will be appended to variable (column) names of
x
, if x
is a data frame. If x
is not a data
frame, this argument will be ignored. The default value to suffix
column names in a data frame depends on the function call:
recoded variables (rec()
) will be suffixed with "_r"
recoded variables (recode_to()
) will be suffixed with "_r0"
dichotomized variables (dicho()
) will be suffixed with "_d"
grouped variables (split_var()
) will be suffixed with "_g"
grouped variables (group_var()
) will be suffixed with "_gr"
standardized variables (std()
) will be suffixed with "_z"
centered variables (center()
) will be suffixed with "_c"
x
, dichotomized. If x
is a data frame, only
the dichotomized variables will be returned.
dicho()
also works on grouped data frames (see group_by
).
In this case, dichotomization is applied to the subsets of variables
in x
. See 'Examples'.
# NOT RUN {
data(efc)
summary(efc$c12hour)
# split at median
table(dicho(efc$c12hour))
# split at mean
table(dicho(efc$c12hour, dich.by = "mean"))
# split between value lowest to 30, and above 30
table(dicho(efc$c12hour, dich.by = 30))
# sample data frame, values from 1-4
head(efc[, 6:10])
# dichtomized values (1 to 2 = 0, 3 to 4 = 1)
library(dplyr)
efc %>%
select(6:10) %>%
dicho(dich.by = 2) %>%
head()
# dichtomize several variables in a data frame
dicho(efc, c12hour, e17age, c160age)
# dichotomize and set labels
frq(dicho(efc, e42dep, var.label = "Dependency (dichotomized)",
val.labels = c("lower", "higher")))
# works also with gouped data frames
mtcars %>%
dicho(disp) %>%
table()
mtcars %>%
group_by(cyl) %>%
dicho(disp) %>%
table()
# dichotomizing grouped data frames leads to different
# results for a dichotomized variable, because the split
# value is different for each group.
# compare:
mtcars %>%
group_by(cyl) %>%
summarise(median = median(disp))
median(mtcars$disp)
# }
Run the code above in your browser using DataLab