textExamples: Identify language examples.

Description

This function identifies examples based on the frequency of use of n-grams (see the topics-pacakge), estimated topic prevalence (see the topics-pacakge), or assessment scores from textTrain() or textPredict().

Usage

textExamples(
  text,
  x_variable,
  y_variable = NULL,
  type = "default",
  n_tile = 4,
  n_examples = 5,
  jitter = NULL,
  filter_words = NULL,
  target_color = "darkgreen",
  predictions_color = "darkblue",
  error_color = "darkred",
  distribution_color = c("#00508c", "#805259", "#a71200", "#0a6882", "#a4a4a4",
    "#e04b39", "#19956e", "#22a567", "#5c8a59"),
  figure_format = "svg",
  scatter_legend_dot_size = 3,
  scatter_legend_bg_dot_size = 2,
  scatter_legend_dots_alpha = 0.8,
  scatter_legend_bg_dots_alpha = 0.2,
  scatter_show_axis_values = TRUE,
  scatter_legend_regression_line_colour = NULL,
  x_axis_range = NULL,
  y_axis_range = NULL,
  grid_legend_x_axes_label = NULL,
  grid_legend_y_axes_label = NULL,
  grid_legend_title = NULL,
  grid_legend_number_size = 8,
  grid_legend_number_color = "white",
  grid_legend_title_color = "black",
  grid_legend_title_size = 0,
  seed = 42
)

Value

A tibble including examples with descriptive variables.

Arguments

text: (string) the language that was used for prediction/assessment/classification.
x_variable: (numeric) the variable used for training (y).
y_variable: (numeric) the outcome from the model (i.e., y_hat).
type: (string) If you are plotting errors between predicted and targeted scores, you can set the type to "prediction_errors", to produce two extra plots: distribution of scores and absolute error.
n_tile: (integer) the n tile to split the data in (to show the most extreme tiles in different colours).
n_examples: (integer) the number of language examples to show/select in each quadrant. When providing both x_variable and y_variable, each example is categorized into one of nine bivariate quadrants based on its position in the scatterplot (e.g., low–low, high–high, center). Within each quadrant, the function selects the most extreme examples by computing the distance to that quadrant’s corner: Corner quadrants (1, 3, 7, 9): Examples closest to the corner points (e.g., min x & max y) are selected using Euclidean distance. Edge quadrants (2, 4, 6, 8): Examples furthest along the relevant axis (x or y) are selected. Center quadrant (5): Examples closest to the mean of both x and y are selected.
jitter: (integer) the percentage of jitter to add to the data for the scatter plot.
filter_words: (character vector) words required to be included in examples.
target_color: (string)
predictions_color: (string) = "darkblue",
error_color: = (string) "darkred",
distribution_color: (string) colors of the distribution plot
figure_format: (string) file format of the figures.
scatter_legend_dot_size: (integer) The size of highlighted dots in the scatter legend.
scatter_legend_bg_dot_size: (integer) The size of background dots in the scatter legend.
scatter_legend_dots_alpha: (numeric) The transparency alphe level of the dots.
scatter_legend_bg_dots_alpha: (numeric) The transparency alphe level of the background dots. For example: c(1,0,1) result in one dot in each quadrant except for the middle quadrant.
scatter_show_axis_values: (boolean) If TRUE, the estimate values are shown on the distribution plot axes.
scatter_legend_regression_line_colour: (string) If a colour string is added, a regression line will be plotted.
x_axis_range: (numeric vector) range of x axis (e.g., c(1, 100)).
y_axis_range: (numeric vector) range of y axis (e.g., c(1, 100)).
grid_legend_x_axes_label: (string) x-axis label of the grid topic plot.
grid_legend_y_axes_label: (string) y-axis label of the grid topic plot.
grid_legend_title: (string)
grid_legend_number_size: (integer)
grid_legend_number_color: (string)
grid_legend_title_color: (string)
grid_legend_title_size: (integer)
seed: (integer) The seed to set for reproducibility.