Creates a scatter plot and calculates a correlation between two variables.
scatterplot(
data = NULL,
x_var_name = NULL,
y_var_name = NULL,
print_correlation = TRUE,
dot_label_var_name = NULL,
weight_var_name = NULL,
alpha = 1,
annotate_stats = TRUE,
annotate_y_pos_rel = 5,
annotate_y_pos_abs = NULL,
annotated_stats_color = "green4",
annotated_stats_font_size = 6,
annotated_stats_font_face = "bold",
line_of_fit_type = "lm",
ci_for_line_of_fit = FALSE,
line_of_fit_color = "blue",
line_of_fit_thickness = 1,
dot_color = "black",
x_axis_label = NULL,
y_axis_label = NULL,
x_axis_tick_marks = NULL,
y_axis_tick_marks = NULL,
dot_size = 2,
dot_label_size = NULL,
dot_size_range = c(3, 12),
jitter_x_y_percent = 0,
jitter_x_percent = 0,
jitter_y_percent = 0,
cap_axis_lines = TRUE,
color_dots_by = NULL,
png_name = NULL,
save_as_png = FALSE,
width = 13,
height = 9
)
the output will be a scatter plot, a ggplot object.
a data object (a data frame or a data.table)
name of the variable that will go on the x axis
name of the variable that will go on the y axis
should the correlation be printed in the console? (default = TRUE)
name of the variable that will be used to label individual observations
name of the variable by which to weight the individual observations for calculating correlation and plotting the line of fit
opacity of the dots (0 = completely transparent, 1 = completely opaque)
if TRUE
, the correlation and p-value will
be annotated at the top of the plot (default = TRUE)
position of the annotated stats, expressed
as a percentage of the range of y values by which the annotated
stats will be placed above the maximum value of y in the data set
(default = 5). This value will be determined relative to the data.
If annotate_y_pos_rel = 5
, and the minimum and
maximum y values in the data set are 0 and 100, respectively,
the annotated stats will be placed at 5% of the y range (100 - 0)
above the maximum y value, y = 0.05 * (100 - 0) + 100 = 105.
as an alternative to the argument
annotate_y_pos_rel
, the input for this argument will determine the
position of the annotated stats. If annotate_y_pos_abs = 7.5
,
then the annotated stats will be placed at the y coordinate of 7.5.
By default, this argument will be ignored unless it receives an input.
That is, by default, the function will use the default value of
the annotate_y_pos_rel
argument to determine the y coordinate
of the annotated stats.
color of the annotated stats (default = "green4").
font size of the annotated stats (default = 6).
font face of the annotated stats (default = "bold").
if line_of_fit_type = "lm"
, a regression
line will be fit; if line_of_fit_type = "loess"
, a local
regression line will be fit; if line_of_fit_type = "none"
,
no line will be fit
if ci_for_line_of_fit = TRUE
,
confidence interval for the line of fit will be shaded
color of the line of fit (default = "blue")
thickness of the line of fit (default = 1)
color of the dots (default = "black")
alternative label for the x axis
alternative label for the y axis
a numeric vector indicating the positions of the tick marks on the x axis
a numeric vector indicating the positions of the tick marks on the y axis
size of the dots on the plot (default = 2)
size for dots' labels on the plot. If no
input is entered for this argument, it will be set as
dot_label_size = 5
by default. If the plot is to be
weighted by some variable, this argument will be ignored, and
dot sizes will be determined by the argument dot_size_range
minimum and maximum size for dots on the plot when they are weighted
horizontally and vertically jitter dots by a percentage of the respective ranges of x and y values.
horizontally jitter dots by a percentage of the range of x values.
vertically jitter dots by a percentage of the range of y values
logical. Should the axis lines be capped at the outer tick marks? (default = TRUE)
name of the variable that will determine colors of the dots
name of the PNG file to be saved. By default, the name will be "scatterplot_" followed by a timestamp of the current time. The timestamp will be in the format, jan_01_2021_1300_10_000001, where "jan_01_2021" would indicate January 01, 2021; 1300 would indicate 13:00 (i.e., 1 PM); and 10_000001 would indicate 10.000001 seconds after the hour.
if save = TRUE
, the plot will be saved
as a PNG file.
width of the plot to be saved. This argument will be
directly entered as the width
argument for the ggsave
function within ggplot2
package (default = 16)
height of the plot to be saved. This argument will be
directly entered as the height
argument for the ggsave
function within ggplot2
package (default = 9)
If a weighted correlation is to be calculated, the following package(s) must be installed prior to running the function: Package 'weights' v1.0 (or possibly a higher version) by John Pasek (2018), https://cran.r-project.org/package=weights
if (FALSE) {
scatterplot(data = mtcars, x_var_name = "wt", y_var_name = "mpg")
scatterplot(
data = mtcars, x_var_name = "wt", y_var_name = "mpg",
dot_label_var_name = "hp", weight_var_name = "drat",
annotate_stats = TRUE)
scatterplot(
data = mtcars, x_var_name = "wt", y_var_name = "mpg",
dot_label_var_name = "hp", weight_var_name = "cyl",
dot_label_size = 7, annotate_stats = TRUE)
scatterplot(
data = mtcars, x_var_name = "wt", y_var_name = "mpg",
color_dots_by = "gear")
}
Run the code above in your browser using DataLab