Uses ggplot2
to plot a scatterplot or dot-like chart for the case
where there is a very large number of overlapping values. This works
for continuous and categorical x
and y
. For continuous
variables it serves the same purpose as hexagonal binning. Counts for
overlapping points are grouped into quantile groups and level of
transparency and rainbow colors are used to provide count information.
Instead, you can specify stick=TRUE
not use color but to encode
cell frequencies
with the height of a black line y-centered at the middle of the bins.
Relative frequencies are not transformed, and the maximum cell
frequency is shown in a caption. Every point with at least a
frequency of one is depicted with a full-height light gray vertical
line, scaled to the above overall maximum frequency. In this way to
relative frequency is to proportion of these light gray lines that are
black, and one can see points whose frequencies are too low to see the
black lines.
The result can also be passed to ggplotly
. Actual cell
frequencies are added to the hover text in that case using the
label
ggplot2
aesthetic.
ggfreqScatter(x, y, by=NULL, bins=50, g=10, cuts=NULL,
xtrans = function(x) x,
ytrans = function(y) y,
xbreaks = pretty(x, 10),
ybreaks = pretty(y, 10),
xminor = NULL, yminor = NULL,
xlab = as.character(substitute(x)),
ylab = as.character(substitute(y)),
fcolors = viridis::viridis(10), nsize=FALSE,
stick=FALSE, html=FALSE, prfreq=FALSE, ...)
a ggplot
object
x-variable
y-variable
an optional vector used to make separate plots for each
distinct value using facet_wrap()
for continuous x
or y
is the number of bins to
create by rounding. Ignored for categorical variables. If a
2-vector, the first element corresponds to x
and the second to
y
.
number of quantile groups to make for frequency counts. Use
g=0
to use frequencies continuously for color
coding. This is recommended only when using plotly
.
instead of using g
, specify cuts
to provide
the vector of cuts for categorizing frequencies for assignment to colors
functions specifying transformations to be made before binning and plotting
vectors of values to label on axis, on original scale
values at which to put minor tick marks, on original scale
axis labels. If not specified and variable has a
label
, thatu label will be used.
colors
argument to pass to
scale_color_gradientn
to color code frequencies. Use
fcolors=gray.colors(10, 0.75, 0)
to show gray
scale, for example. Another good choice is
fcolors=hcl.colors(10, 'Blue-Red')
.
set to TRUE
to not vary color or transparency but
instead to size the symbols in relation to the number of points. Best
with both x
and y
are discrete. ggplot2
size
is taken as the fourth root of the frequency. If there
are 15 or unique frequencies all the unique frequencies are used,
otherwise g
quantile groups of frequencies are used.
set to TRUE
to not use colors but instead use
varying-height black vertical lines to depict cell frequencies.
set to TRUE
to use html in axis labels instead of
plotmath
set to TRUE
to print the frequency distributions of
the binned coordinate frequencies
arguments to pass to geom_point
such as shape
and size
Frank Harrell
set.seed(1)
x <- rnorm(1000)
y <- rnorm(1000)
count <- sample(1:100, 1000, TRUE)
x <- rep(x, count)
y <- rep(y, count)
# color=alpha=NULL below makes loess smooth over all points
g <- ggfreqScatter(x, y) + # might add g=0 if using plotly
geom_smooth(aes(color=NULL, alpha=NULL), se=FALSE) +
ggtitle("Using Deciles of Frequency Counts, 2500 Bins")
g
# plotly::ggplotly(g, tooltip='label') # use plotly, hover text = freq. only
# Plotly makes it somewhat interactive, with hover text tooltips
# Instead use varying-height sticks to depict frequencies
ggfreqScatter(x, y, stick=TRUE) +
labs(subtitle='Relative height of black lines to gray lines
is proportional to cell frequency.
Note that points with even tiny frequency are visable
(gray line with no visible black line).')
# Try with x categorical
x1 <- sample(c('cat', 'dog', 'giraffe'), length(x), TRUE)
ggfreqScatter(x1, y)
# Try with y categorical
y1 <- sample(LETTERS[1:10], length(x), TRUE)
ggfreqScatter(x, y1)
# Both categorical, larger point symbols, box instead of circle
ggfreqScatter(x1, y1, shape=15, size=7)
# Vary box size instead
ggfreqScatter(x1, y1, nsize=TRUE, shape=15)
Run the code above in your browser using DataLab