Unlimited learning, half price | 50% off
Get 50% off unlimited learning

CGPfunctions (version 0.6.3)

SeeDist: SeeDist -- See The Distribution

Description

This function takes a vector of numeric data and returns one or more ggplot2 plots that help you visualize the data. Meant to be a useful wrapper for exploring univariate data. Has a plethora of options including type of visualization (histogram, boxplot, density, violin) as well as commonly desired overplots like mean and median points, z and t curves etc.. Common descriptive statistics are provided as a subtitle if desired and sent to the console as well.

Usage

SeeDist(
  x,
  title = "Default",
  subtitle = "Default",
  numbins = 0,
  xlab = NULL,
  var_explain = NULL,
  data.fill.color = "deepskyblue",
  mean.line.color = "darkgreen",
  median.line.color = "yellow",
  mode.line.color = "orange",
  mean.line.type = "longdash",
  median.line.type = "dashed",
  mode.line.type = "dashed",
  mean.line.size = 1.5,
  median.line.size = 1.5,
  mean.point.shape = 21,
  median.point.shape = 23,
  mean.point.size = 4,
  median.point.size = 4,
  zcurve.color = "red",
  zcurve.type = "twodash",
  zcurve.size = 1,
  tcurve.color = "black",
  tcurve.type = "dotted",
  tcurve.size = 1,
  mode.line.size = 1,
  whatplots = c("d", "b", "h", "v"),
  k = 2,
  add_jitter = TRUE,
  add_rug = TRUE,
  xlim_left = NULL,
  xlim_right = NULL,
  ggtheme = ggplot2::theme_bw()
)

Arguments

x

the data to be visualized. Must be numeric.

title

Optionally replace the default title displayed. title = NULL will remove it entirely. title = "" will provide an empty title but retain the spacing. A sensible default is provided otherwise.

subtitle

Optionally replace the default subtitle displayed. subtitle = NULL will remove it entirely. subtitle = "" will provide an empty subtitle but retain the spacing. A sensible default is provided otherwise.

numbins

the number of bins to use for any plots that bin. If nothing is specified the function will calculate a rational number using Freedman-Diaconis via the nclass.FD function

xlab

Custom text for the `x` axis label (Default: `NULL`, which will cause the `x` axis label to be the `x` variable).

var_explain

additional contextual information about the variable as a string such as "Miles Per Gallon" which is appended to the default title information.

data.fill.color

Character string that specifies fill color for the main data area (Default: `deepskyblue`).

mean.line.color, median.line.color, mode.line.color

Character string that specifies line color (Default: `darkgreen`, `yellow`, `orange`).

mean.line.type, median.line.type, mode.line.type

Character string that specifies line color (Default: `longdash`, `dashed`, `dashed`).

mean.line.size, median.line.size, mode.line.size

Numeric that specifies line size (Default: `1.5`, `1.5`, `1`). You can set to `0` to make any of the lines "disappear".

mean.point.shape, median.point.shape

Integer in 0 - 25 specifies shape of mean or median point mark on the violin plot (Default: `21`, `23`).

mean.point.size, median.point.size

Integer specifies size of mean or median point mark on the violin plot (Default: `4`). You can set to `0` to make any of the points "disappear".

zcurve.color, tcurve.color

Character string that specifies line color (Default: `red`, `black`).

zcurve.type, tcurve.type

Character string that specifies line color (Default: `twodash`, `dotted`).

zcurve.size, tcurve.size

Numeric that specifies line size (Default: `1`). You can set to `0` to make any of the lines "disappear".

whatplots

what type of plots? The default is whatplots = c("d", "b", "h", "v") for a density, a boxplot, a histogram, and a violin plot

k

Number of digits after decimal point (should be an integer) (Default: k = 2) for statistical results.

add_jitter

Logical (Default: `TRUE`) controls whether jittered data ponts are added to violin plot.

add_rug

Logical (Default: `TRUE`) controls whether "rug" data points are added to density plot and histogram.

xlim_left, xlim_right

Logical. For density plots can be used to override the default which is 3 std deviations left and right of the mean of x. Useful for theoretical reasons like horsepower < 0 or when `ggplot2` warns you that it has removed rows containing non-finite values (stat_density).

ggtheme

A function, ggplot2 theme name. Default value is ggplot2::theme_bw(). Any of the ggplot2 themes, or themes from extension packages are allowed (e.g., hrbrthemes::theme_ipsum(), etc.).

Value

from 1 to 4 plots depending on what the user specifies as well as an extensive summary courtesy `DescTools::Desc` printed to the console

Warning

If the data has more than 3 modal values only the first three of them are plotted. The rest are ignored and the user is warned on the console.

Missing values are removed with a warning to the user

See Also

nclass

Examples

Run this code
# NOT RUN {
SeeDist(rnorm(100, mean = 100, sd = 20), numbins = 15, var_explain = "A Random Sample")
SeeDist(mtcars$hp, var_explain = "Horsepower", whatplots = c("d", "b"))
SeeDist(iris$Sepal.Length, var_explain = "Sepal Length", whatplots = "d")
# }

Run the code above in your browser using DataLab