A readable, complete and beautiful graph for multiple
correspondence analysis made with FactoMineR::MCA
.
Interactive tooltips, appearing when hovering near points with mouse,
allow to keep in mind many important data (tables of active variables,
and additional chosen variables) while reading the graph.
Profiles of answers (from the graph of "individuals") are drawn in the back,
and can be linked to FactoMineR::HCPC
classes.
Since it is made in the spirit of ggplot2
, it is possible to
change theme or add another plot elements with +
. Then, interactive
tooltips won't appear until you pass the result through ggi
.
Step-by-step functions : use ggmca_data to get the data frames with every
parameter in a MCA printing, then modify, and pass to ggmca_plot
to draw the graph.
ggmca(
res.mca,
dat,
sup_vars,
active_tables,
tooltip_vars_1lv,
tooltip_vars,
axes = c(1, 2),
axes_names = NULL,
axes_reverse = NULL,
type = c("text", "labels", "points", "numbers", "facets"),
color_groups = "^.{0}",
cah_color_groups = "^.+$",
keep_levels,
discard_levels,
cleannames = TRUE,
profiles = FALSE,
profiles_tooltip_discard = "^Not |^No |^Pas |^Non ",
cah,
max_profiles = 5000,
alpha_profiles = 0.7,
color_profiles = TRUE,
base_profiles_color = "#aaaaaa",
text_repel = FALSE,
title,
actives_in_bold = NULL,
sup_in_italic = FALSE,
ellipses = NULL,
xlim,
ylim,
out_lims_move = FALSE,
shift_colors = 0,
colornames_recode,
scale_color_light = material_colors_light(),
scale_color_dark = material_colors_dark(),
text_size = 3.5,
size_scale_max = 4,
dist_labels = c("auto", 0.04),
right_margin = 0,
use_theme = TRUE,
get_data = FALSE
)ggmca_data(
res.mca,
dat,
sup_vars,
active_tables,
tooltip_vars_1lv,
tooltip_vars,
color_groups = "^.{0}",
cah_color_groups = "^.+$",
keep_levels,
discard_levels,
cleannames = TRUE,
profiles = FALSE,
profiles_tooltip_discard = "^Pas |^Non |^Not |^No ",
cah,
max_profiles = 5000
)
ggmca_plot(
data,
axes = c(1, 2),
axes_names = NULL,
axes_reverse = NULL,
type = c("text", "points", "labels", "active_vars_only", "numbers", "facets"),
text_repel = FALSE,
title,
ellipses = NULL,
actives_in_bold = NULL,
sup_in_italic = FALSE,
xlim,
ylim,
out_lims_move = FALSE,
color_profiles = TRUE,
base_profiles_color = "#aaaaaa",
alpha_profiles = 0.7,
shift_colors = 0,
colornames_recode,
scale_color_light = material_colors_light(),
scale_color_dark = material_colors_dark(),
text_size = 3.5,
size_scale_max = 4,
dist_labels = c("auto", 0.04),
right_margin = 0,
use_theme = TRUE,
get_data = FALSE
)
A ggplot
object to be printed in the
`RStudio` Plots pane. Possibility to add other gg objects with +
.
Sending the result through ggi
will draw the
interactive graph in the Viewer pane using ggiraph
.
A list containing the data frames to pass to ggmca_plot.
A ggplot
object.
An object created with FactoMineR::MCA
.
The data in which to find the supplementary variables, etc.
A character vectors of supplementary qualitative variables
to print (they don't need to be passed in MCA
before).
Should colored crosstables be added in interactive tooltips ?
`active_tables = "sup"` crosses each `sup_vars` with active variables.
`active_tables = "active"` crosses each active_variables with the other ones,
giving results closely related with the burt table used to calculate multiple
correspondance analysis. It may take time to calculate with many variables.
`active_tables = c("active", "sup")` do both. In tooltips, percentages are colored
in blue when spread from mean is positive (over-representations), and in red when
spread from mean is negative (under-representations), like in
tab
with `color = "diff"`.
A character vectors of variables, whose first level (if character/factor) or weighted_mean (if numeric) will be added at the top of interactive tooltips.
A character vector of variables (character/factors), whose complete levels will be added at the bottom of interactive tooltips.
The axes to print, as a numeric vector of length 2.
Names of all the axes (not just the two selected ones), as a character vector.
Possibility to reserve the coordinates of the axes by providing a numeric vector : `1` to invert left and right ; `2` to invert up and down ; `1:2` to invert both.
Determines the way sup_vars
are printed.
"text"
: colored text
"points"
: colored points with text legends
"labels"
: colored labels
"active_vars_only"
: no sup_vars
"numbers"
: colored labels of prefix numbers, with small names
"facets"
: one graph of profiles of answer for each levels of the
first sup_vars
. A different color is used for each.
By default, there is one color group for all the levels
of each `sup_vars`. It is possible to color `sup_vars` with groups created
upon their levels with str_extract
and regexes.
For exemple, `color_groups = "^."` makes the groups upon the first character
of each levels (uselful when their begin by numbers).
color_groups = "^.{3}"
upon the first three characters.
color_groups = "NB.+$"
takes anything between the `"NB"` and the end of levels
names, etc.
Color groups for the `cah` variable (HCPC clusters).
A character vector of variables levels to keep : others will be discarded.
A character vector of variables levels to discard.
Set to TRUE
to clean levels names, by removing
prefix numbers like "1-"
, and text in parentheses.
When set to TRUE
, profiles of answers are drawn in the back
of the graph with light-grey points. When hovering with mouse in the interactive
version (passed in ggi
), the answers of individuals to active variables
will appears. If cah
is provided, to hover near one point will color all the
points of the same HCPC
class.
A regex pattern to remove useless levels among interactive tooltips for profiles of answers (ex. : levels expressing "no" answers).
A HCPC clusters variable made with HCPC
on `res.mca`, to link the answers-profiles points who share the same HCPC class
(will be colored the same color and linked at mouse hover).
The maximum number of profiles points to print. Default to 5000.
The alpha (transparency, between 0 and 1) for profiles of answer.
By default, if cah
is provided, profiles are
colored based on cah levels (HCPC clusters). Set do FALSE
to avoid this behaviour.
You can also give a character vector with only some of the levels of
the `cah` variable .
The base color for answers profiles. Default to gray. Set to `NULL` to discard profiles. With `color_profiles`, set to `NULL` to discard the non-colored profiles.
When TRUE
the graph is not interactive anymore,
but the resulting image is better to print because points and labels don't
overlaps. It uses ggrepel::geom_text_repel
.
The title of the graph.
Set to `TRUE` to set active variables in bold font (and sup variables in plain).
Set to `TRUE` to set sup variables in italics.
Set to a number between 0 and 1 to draw a concentration ellipse for
each level of the first sup_vars
. 0.95
draw ellipses containing 95
individuals of each category. 0.5
draw median-ellipses, containing half
the individuals of each category. Note that, if `max_profiles` is provided, ellipses
won't be made with all individuals.
Horizontal and vertical axes limits, as double vectors of length 2.
When TRUE
, the points out of xlim
or
ylim
are not removed, but moved at the edges of the graph.
Change colors of the sup_vars
points.
A named character vector with
fct_recode
style to rename the levels of the color
variable if needed (levels used for colors are printed in console message
whenever the function is used).
A scale color for sup vars points
A scale color for sup vars texts
Size of text.
Size of points.
When type = points
, the distance of labels
from points.
A margin at the right, in cm. Useful to read tooltips over points placed at the right of the graph without formatting problems.
By default, a specific ggplot2
theme is used.
Set to FALSE
to customize your own theme
.
Returns the data frame to create the plot instead of the plot itself.
A list of data frames made with ggmca_data.
ggmca_data()
: get the data frames with all parameters to print a MCA graph
ggmca_plot()
: print MCA graph from data frames with parameters
# \donttest{
data(tea, package = "FactoMineR")
res.mca <- MCA2(tea, active_vars = 1:18)
# Interactive graph for multiple correspondence analysis :
res.mca |>
ggmca(tea, sup_vars = c("SPC"), ylim = c(NA, 1.2), text_repel = TRUE) |>
ggi() #to make the graph interactive
# Interactive graph with access to all crosstables between active variables (burt table).
# Spread from mean are colored and, usually, points near the middle will have less
# colors, and points at the edges will have plenty. It may takes time to print, but
# helps to interpret the MCA in close proximity with the underlying data.
res.mca |>
ggmca(tea, ylim = c(NA, 1.2), active_tables = "active", text_repel = TRUE) |>
ggi()
# Graph with colored HCPC clusters
cah <- FactoMineR::HCPC(res.mca, nb.clust = 6, graph = FALSE)
tea$clust <- cah$data.clust$clust
ggmca(res.mca, tea, cah = "clust", profiles = TRUE, text_repel = TRUE)
# Concentration ellipses for each levels of a supplementary variable :
ggmca(res.mca, tea, sup_vars = "SPC", ylim = c(NA, 1.2),
ellipses = 0.5, text_repel = TRUE, profiles = TRUE)
# Graph of profiles of answer for each levels of a supplementary variable :
ggmca(res.mca, tea, sup_vars = "SPC", ylim = c(NA, 1.2),
type = "facets", ellipses = 0.5, profiles = TRUE)
# }
Run the code above in your browser using DataLab