Hmisc (version 4.1-1)

ggfreqScatter: Frequency Scatterplot

Description

Uses ggplot2 to plot a scatterplot or dot-like chart for the case where there is a very large number of overlapping values. This works for continuous and categorical x and y. For continuous variables it serves the same purpose as hexagonal binning. Counts for overlapping points are grouped into quantile groups and level of transparency and rainbow colors are used to provide count information.

The result can also be passed to ggplotly. Actual cell frequencies are added to the hover text in that case.

Usage

ggfreqScatter(x, y, bins=50, g=10, cuts=NULL,
              xtrans = function(x) x,
              ytrans = function(y) y,
              xbreaks = pretty(x, 10),
              ybreaks = pretty(y, 10),
              xminor  = NULL, yminor = NULL,
              xlab = as.character(substitute(x)),
              ylab = as.character(substitute(y)),
              fcolors = viridis::viridis(10), nsize=FALSE,
              html=FALSE, prfreq=FALSE, …)

Arguments

x

x-variable

y

y-variable

bins

for continuous x or y is the number of bins to create by rounding. Ignored for categorical variables. If a 2-vector, the first element corresponds to x and the second to y.

g

number of quantile groups to make for frequency counts. Use g=0 to use frequencies continuously for color and alpha coding. This is recommended only when using plotly.

cuts

instead of using g, specify cuts to provide the vector of cuts for categorizing frequencies for assignment to colors

xtrans,ytrans

functions specifying transformations to be made before binning and plotting

xbreaks,ybreaks

vectors of values to label on axis, on original scale

xminor,yminor

values at which to put minor tick marks, on original scale

xlab,ylab

axis labels. If not specified and variable has a label, that label will be used.

fcolors

colors argument to pass to scale_color_gradientn to color code frequencies

nsize

set to TRUE to not vary color or transparency but instead to size the symbols in relation to the number of points. Best with both x and y are discrete. ggplot2 size is taken as the fourth root of the frequency. If there are 15 or unique frequencies all the unique frequencies are used, otherwise g quantile groups of frequencies are used.

html

set to TRUE to use html in axis labels instead of plotmath

prfreq

set to TRUE to print the frequency distributions of the binned coordinate frequencies

arguments to pass to geom_point such as shape and size

Value

a ggplot object

See Also

cut2

Examples

Run this code
# NOT RUN {
set.seed(1)
x <- rnorm(1000)
y <- rnorm(1000)
count <- sample(1:100, 1000, TRUE)
x <- rep(x, count)
y <- rep(y, count)
# color=alpha=NULL below makes loess smooth over all points
g <- ggfreqScatter(x, y) +   # might add g=0 if using plotly
      geom_smooth(aes(color=NULL, alpha=NULL), se=FALSE) +
      ggtitle("Using Deciles of Frequency Counts, 2500 Bins")
g
# plotly::ggplotly(g, tooltip='label')  # use plotly, hover text = freq. only
# Plotly makes it somewhat interactive, with hover text tooltips

# Try with x categorical
x1 <- sample(c('cat', 'dog', 'giraffe'), length(x), TRUE)
ggfreqScatter(x1, y)

# Try with y categorical
y1 <- sample(LETTERS[1:10], length(x), TRUE)
ggfreqScatter(x, y1)

# Both categorical, larger point symbols, box instead of circle
ggfreqScatter(x1, y1, shape=15, size=7)
# Vary box size instead
ggfreqScatter(x1, y1, nsize=TRUE, shape=15)
# }

Run the code above in your browser using DataCamp Workspace