vipor (version 0.4.7)

offsetX: Offset data using quasirandom noise to avoid overplotting

Description

Arranges data points using quasirandom noise (van der Corput sequence), pseudorandom noise or alternatively positioning extreme values within a band to the left and right to form beeswarm/one-dimensional scatter/strip chart style plots. That is a plot resembling a cross between a violin plot (showing the density distribution) and a scatter plot (showing the individual points). This function returns a vector of the offsets to be used in plotting.

Usage

offsetX(y, x = rep(1, length(y)), width = 0.4, varwidth = FALSE, ...)

offsetSingleGroup( y, maxLength = NULL, method = c("quasirandom", "pseudorandom", "smiley", "maxout", "frowney", "minout", "tukey", "tukeyDense"), nbins = NULL, adjust = 1 )

Value

a vector with of x-offsets of the same length as y

Arguments

y

vector of data points

x

a grouping factor for y (optional)

width

the maximum spacing away from center for each group of points. Since points are spaced to left and right, the maximum width of the cluster will be approximately width*2 (0 = no offset, default = 0.4)

varwidth

adjust the width of each group based on the number of points in the group

...

additional arguments to offsetSingleGroup

maxLength

multiply the offset by sqrt(length(y)/maxLength) if not NULL. The sqrt is to match boxplot (allows comparison of order of magnitude different ns, scale with standard error)

method

method used to distribute the points:

quasirandom:

points are distributed within a kernel density estimate of the distribution with offset determined by quasirandom Van der Corput noise

pseudorandom:

points are distributed within a kernel density estimate of the distribution with offset determined by pseudorandom noise a la jitter

maxout:

points are distributed within a kernel density with points in a band distributed with highest value points on the outside and lowest in the middle

minout:

points are distributed within a kernel density with points in a band distributed with highest value points in the middle and lowest on the outside

tukey:

points are distributed as described in Tukey and Tukey "Strips displaying empirical distributions: I. textured dot strips"

tukeyDense:

points are distributed as described in Tukey and Tukey but are constrained with the kernel density estimate

nbins

the number of points used to calculate density (defaults to 1000 for quasirandom and pseudorandom and 100 for others)

adjust

adjust the bandwidth used to calculate the kernel density (smaller values mean tighter fit, larger values looser fit, default is 1)

Examples

Run this code
## Generate fake data
dat <- list(rnorm(50), rnorm(500), c(rnorm(100), rnorm(100,5)), rcauchy(100))
names(dat) <- c("Normal", "Dense Normal", "Bimodal", "Extremes")

## Plot each distribution with a variety of parameters
par(mfrow=c(4,1), mar=c(2,4, 0.5, 0.5))
sapply(names(dat),function(label) {
  y<-dat[[label]]
  
  offsets <- list(
    'Default'=offsetX(y),
    'Smoother'=offsetX(y, adjust=2),
    'Tighter'=offsetX(y, adjust=0.1),
    'Thinner'=offsetX(y, width=0.1)
  )
  ids <- rep(1:length(offsets), sapply(offsets,length))
  
  plot(unlist(offsets) + ids, rep(y, length(offsets)), 
       ylab=label, xlab='', xaxt='n', pch=21, las=1)
  axis(1, 1:4, c("Default", "Adjust=2", "Adjust=0.1", "Width=10%"))
})

Run the code above in your browser using DataCamp Workspace