textProjectionPlot() plots words according to Supervised Dimension Projection.
textProjectionPlot(
word_data,
k_n_words_to_test = FALSE,
min_freq_words_test = 1,
min_freq_words_plot = 1,
plot_n_words_square = 3,
plot_n_words_p = 5,
plot_n_word_extreme = 5,
plot_n_word_frequency = 5,
plot_n_words_middle = 5,
plot_n_word_random = 0,
titles_color = "#61605e",
y_axes = FALSE,
p_alpha = 0.05,
overlapping = TRUE,
p_adjust_method = "none",
projection_metric = "dot_product",
title_top = "Supervised Dimension Projection",
x_axes_label = "Supervised Dimension Projection (SDP)",
y_axes_label = "Supervised Dimension Projection (SDP)",
scale_x_axes_lim = NULL,
scale_y_axes_lim = NULL,
word_font = NULL,
bivariate_color_codes = c("#398CF9", "#60A1F7", "#5dc688", "#e07f6a", "#EAEAEA",
"#40DD52", "#FF0000", "#EA7467", "#85DB8E"),
word_size_range = c(3, 8),
position_jitter_hight = 0,
position_jitter_width = 0.03,
point_size = 0.5,
arrow_transparency = 0.1,
points_without_words_size = 0.2,
points_without_words_alpha = 0.2,
legend_title = "SDP",
legend_x_axes_label = "x",
legend_y_axes_label = "y",
legend_x_position = 0.02,
legend_y_position = 0.02,
legend_h_size = 0.2,
legend_w_size = 0.2,
legend_title_size = 7,
legend_number_size = 2,
legend_number_colour = "white",
group_embeddings1 = FALSE,
group_embeddings2 = FALSE,
projection_embedding = FALSE,
aggregated_point_size = 0.8,
aggregated_shape = 8,
aggregated_color_G1 = "black",
aggregated_color_G2 = "black",
projection_color = "blue",
seed = 1005,
explore_words = NULL,
explore_words_color = "#ad42f5",
explore_words_point = "ALL_1",
explore_words_aggregation = "mean",
remove_words = NULL,
n_contrast_group_color = NULL,
n_contrast_group_remove = FALSE,
space = NULL,
scaling = FALSE
)
A 1- or 2-dimensional word plot, as well as tibble with processed data used to plot.
Dataframe from textProjection
Select the k most frequent words to significance test (k = sqrt(100*N); N = number of participant responses). Default = TRUE.
Select words to significance test that have occurred at least min_freq_words_test (default = 1).
Select words to plot that has occurred at least min_freq_words_plot times.
Select number of significant words in each square of the figure to plot. The significant words, in each square is selected according to most frequent words.
Number of significant words to plot on each(positive and negative) side of the x-axes and y-axes, (where duplicates are removed); selects first according to lowest p-value and then according to frequency. Hence, on a two dimensional plot it is possible that plot_n_words_p = 1 yield 4 words.
Number of words that are extreme on Supervised Dimension Projection per dimension. (i.e., even if not significant; per dimensions, where duplicates are removed).
Number of words based on being most frequent. (i.e., even if not significant).
Number of words plotted that are in the middle in Supervised Dimension Projection score (i.e., even if not significant; per dimensions, where duplicates are removed).
(numeric) select random words to plot.
Color for all the titles (default: "#61605e")
If TRUE, also plotting on the y-axes (default is FALSE). Also plotting on y-axes produces a two dimension 2-dimensional plot, but the textProjection function has to have had a variable on the y-axes.
Alpha (default = .05).
(boolean) Allow overlapping (TRUE) or disallow (FALSE) (default = TRUE).
Method to adjust/correct p-values for multiple comparisons (default = "holm"; see also "none", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr").
(character) Metric to plot according to; "dot_product" or "cohens_d".
Title (default " ")
Label on the x-axes.
Label on the y-axes.
Manually set the length of the x-axes (default = NULL, which uses ggplot2::scale_x_continuous(limits = scale_x_axes_lim); change e.g., by trying c(-5, 5)).
Manually set the length of the y-axes (default = NULL; which uses ggplot2::scale_y_continuous(limits = scale_y_axes_lim); change e.g., by trying c(-5, 5)).
Font type (default: NULL).
The different colors of the words. Note that, at the moment, two squares should not have the exact same colour-code because the numbers within the squares of the legend will then be aggregated (and show the same, incorrect value). (default: c("#398CF9", "#60A1F7", "#5dc688", "#e07f6a", "#EAEAEA", "#40DD52", "#FF0000", "#EA7467", "#85DB8E")).
Vector with minimum and maximum font size (default: c(3, 8)).
Jitter height (default: .0).
Jitter width (default: .03).
Size of the points indicating the words' position (default: 0.5).
Transparency of the lines between each word and point (default: 0.1).
Size of the points not linked with a words (default is to not show it, i.e., 0).
Transparency of the points not linked with a words (default is to not show it, i.e., 0).
Title on the color legend (default: "(SDP)".
Label on the color legend (default: "(x)".
Label on the color legend (default: "(y)".
Position on the x coordinates of the color legend (default: 0.02).
Position on the y coordinates of the color legend (default: 0.05).
Height of the color legend (default 0.15).
Width of the color legend (default 0.15).
Font size (default: 7).
Font size of the values in the legend (default: 2).
(string) Colour of the numbers in the box legend.
Shows a point representing the aggregated word embedding for group 1 (default = FALSE).
Shows a point representing the aggregated word embedding for group 2 (default = FALSE).
Shows a point representing the aggregated direction embedding (default = FALSE).
Size of the points representing the group_embeddings1, group_embeddings2 and projection_embedding
Shape type of the points representing the group_embeddings1, group_embeddings2 and projection_embeddingd
Color
Color
Color
Set different seed.
Explore where specific words are positioned in the embedding space. For example, c("happy content", "sad down").
Specify the color(s) of the words being explored. For example c("#ad42f5", "green")
Specify the names of the point for the aggregated word embeddings of all the explored words.
Specify how to aggregate the word embeddings of the explored words.
manually remove words from the plot (which is done just before the words are plotted so that the remove_words are part of previous counts/analyses).
Set color to words that have higher frequency (N) on the other opposite side of its dot product projection (default = NULL).
Remove words that have higher frequency (N) on the other opposite side of its dot product projection (default = FALSE).
Provide a semantic space if using static embeddings and wanting to explore words.
Scaling word embeddings before aggregation.
See textProjection
.
# The test-data included in the package is called: DP_projections_HILS_SWLS_100.
# The dataframe created by textProjection can also be used as input-data.
# Supervised Dimension Projection Plot
plot_projection <- textProjectionPlot(
word_data = DP_projections_HILS_SWLS_100,
k_n_words_to_test = FALSE,
min_freq_words_test = 1,
plot_n_words_square = 3,
plot_n_words_p = 3,
plot_n_word_extreme = 1,
plot_n_word_frequency = 1,
plot_n_words_middle = 1,
y_axes = FALSE,
p_alpha = 0.05,
title_top = "Supervised Dimension Projection (SDP)",
x_axes_label = "Low vs. High HILS score",
y_axes_label = "Low vs. High SWLS score",
p_adjust_method = "bonferroni",
scale_y_axes_lim = NULL
)
plot_projection
# Investigate elements in DP_projections_HILS_SWLS_100.
names(DP_projections_HILS_SWLS_100)
Run the code above in your browser using DataLab