Learn R Programming

EnvStats (version 2.0.0)

stripChart: 1-D Scatter Plots with Confidence Intervals

Description

stripChart is a modification of the Rfunction stripchart. It is a generic function used to produce one dimensional scatter plots (or dot plots) of the given data, along with text indicating sample size and estimates of location (mean or median) and scale (standard deviation or interquartile range), as well as confidence intervals for the population location parameter. One dimensional scatterplots are a good alternative to boxplots when sample sizes are small or moderate. The function invokes particular methods which depend on the class of the first argument.

Usage

stripChart(x, ...)

## S3 method for class 'formula':
stripChart(x, data = NULL, dlab = NULL, 
    subset, na.action = NULL, ...)

## S3 method for class 'default':
stripChart(x, method = "stack", seed = 47, 
  jitter = 0.1 * cex, offset = 1/2, vertical = TRUE, group.names, 
  group.names.cex = cex, drop.unused.levels = TRUE, add = FALSE, 
  at = NULL, xlim = NULL, ylim = NULL, ylab = NULL, xlab = NULL, 
  dlab = "", glab = "", log = "", pch = 1, col = par("fg"), 
  cex = par("cex"), points.cex = cex, axes = TRUE, frame.plot = axes, 
  show.ci = TRUE, location.pch = 16, location.cex = cex, 
  conf.level = 0.95, min.n.for.ci = 2, 
  ci.offset = 3/ifelse(n > 2, (n-1)^(1/3), 1), 
  ci.bar.ends = TRUE, ci.bar.ends.size = 0.5 * cex, 
  ci.bar.gap = FALSE, n.text = "bottom", 
  n.text.line = ifelse(n.text == "bottom", 2, 0), 
  n.text.cex = cex, location.scale.text = "top", 
  location.scale.digits = 1, location.scale.text.line = 
    ifelse(location.scale.text == "top", 0, 3.5), 
  location.scale.text.cex = 
    cex * 0.8 * ifelse(n > 6, max(0.4, 1 - (n-6) * 0.06), 1), 
  p.value = FALSE, p.value.digits = 3, p.value.line = 2, 
  p.value.cex = cex, group.difference.ci = p.value, 
  group.difference.conf.level = 0.95, 
  group.difference.digits = location.scale.digits, 
  ci.and.test = "parametric", ci.arg.list = NULL, 
  test.arg.list = NULL, alternative = "two.sided", ...)

Arguments

x
the data from which the plots are to be produced. In the default method the data can be specified as a list or data frame where each component is numeric, a numeric matrix, or a numeric vector. In the formula method, a symbolic specification of
data
for the formula method, a data.frame (or list) from which the variables in x should be taken.
subset
for the formula method, an optional vector specifying a subset of observations to be used for plotting.
na.action
for the formula method, a function which indicates what should happen when the data contain NAs. The default is to ignore missing values in either the response or the group.
...
additional parameters passed to the default method, or by it to plot, points, axis, and
method
the method to be used to separate coincident points. The method "overplot" causes such points to be overplotted, but it is also possible to specify "jitter" to jitter the points, or "stack" to have coinc
seed
when method="jitter" is used, the argument seed is passed to the Rfunction set.seed. Since jittering depends on the Rrandom number generator, using the same value of
jitter
when method="jitter" is used, jitter gives the amount of jittering applied.
offset
when stacking is used, points are stacked this many line-heights (symbol widths) apart.
vertical
when vertical=TRUE (the default), the plots are drawn vertically rather than horizontally.
group.names
group labels which will be printed alongside (or underneath) each plot.
group.names.cex
numeric scalar indicating the amount by which the group labels should be scaled relative to the default (see the help file for plot.default). The default is the current value of the graphics p
drop.unused.levels
when drop.unused.levels=TRUE, groups with no observations are dropped.
add
logical, if true add the chart to the current plot.
at
numeric vector giving the locations where the charts should be drawn, particularly when add=TRUE; defaults to 1:n where n is the number of groups.
xlim, ylim
plot limits: see plot.window.
ylab, xlab
labels: see title.
dlab, glab
alternate way to specify axis labels. The dlab and glab labels may be used instead of xlab and ylab if those are not specified. dlab applies to the continuous data axis (the $y$-ax
log
on which axes to use a log scale: see plot.default.
pch, col, cex
Graphical parameters: see par.
points.cex
Sets the cex value for the points plotted.
axes, frame.plot
Axis control: see plot.default.
show.ci
logical scalar indicating whether to plot the confidence interval. The default is show.ci=TRUE.
location.pch
integer indicating which plotting character to use to indicate the estimate of location (mean or median) for each group (see the help file for plot.default). The default is location.pch=
location.cex
numeric scalar giving the amount by which the plotting characters indicating the estimate of location for each group should be scaled relative to the default (see the help file for plot.default
conf.level
numeric scalar between 0 and 1 indicating the confidence level associated with the confidence interval for the group location (population mean or median). The default value is conf.level=0.95.
min.n.for.ci
integer indicating the minimum sample size required in order to plot a confidence interval for the group location. The default value is min.n.for.ci=2.
ci.offset
numeric scalar or vector of length equal to the number of groups (n) in units of cex indicating the amount of space between the line showing the confidence interval and tick mark associated with a particular group. The d
ci.bar.ends
logical scalar indicating whether to add flat ends to the confidence interval bars. The default value is ci.bar.ends=TRUE.
ci.bar.ends.size
numeric scalar in units of cxy indicating the size of confidence interval bar ends. The default value is half of the current value of cex.
ci.bar.gap
logical scalar indicating with to add a gap between the estimate of group location and the confidence interval bar. The default value is ci.bar.gap=FALSE.
n.text
character string indicating whether and where to indicate the sample size for each group. Possible values are "bottom" (the default), "top", and "none".
n.text.line
integer indicating on which plot margin line to show the sample sizes for each group. The default value is n.text.line=2 when n.text="bottom" and 0 otherwise.
n.text.cex
numeric scalar giving the amount by which the text indicating the sample size for each group should be scaled relative to the default (see the help file for plot.default). The default is the curr
location.scale.text
character string indicating whether and where to indicate the estimates of location (mean or median) and scale (standard deviation or interquartile range) for each group. Possible values are "top" (the default), "bottom"
location.scale.digits
integer indicating the number of digits to round the estimates of location and scale. The default value is location.scale.digits=1.
location.scale.text.line
integer indicating on which plot margin line to show the estimates of location and scale for each group. The default value is location.scale.text.line=0 when n.text="top" and 3.5 otherwise.
location.scale.text.cex
numeric scalar giving the amount by which the text indicating the estimates of location and scale for each group should be scaled relative to the default (see the help file for plot.default).
p.value
logical scalar indicating whether to show the p-value associated with testing whether all groups have the same population location. The default value is p.value=TRUE. The p-value is displayed at the top of the graph.
p.value.digits
integer indicating the number of digits to round to when displaying the p-value associated with the test of equal group locations. The default value is p.value.digits=3.
p.value.line
integer indicating on which plot margin line to show the p-value associated with the test of equal group locations. The default value is p.value.line=2.
p.value.cex
numeric scalar giving the amount by which the text indicating the p-value associated with the test of equal group locations should be scaled relative to the default (see the help file for plot.default
group.difference.ci
for the case when there are just 2 groups, a logical scalar indicating whether to show the confidence interval for the difference between group locations. The default is the value of the p.value argument. The confidence interval is
group.difference.conf.level
for the case when there are just 2 groups, a numeric scalar between 0 and 1 indicating the confidence level associated with the confidence interval for the difference between group locations. The default is conf.level=0.95.
group.difference.digits
for the case when there are just 2 groups, an integer indicating the number of digits to round to when displaying the confidence interval for the difference between group locations. The default value is group.difference.digits=location.sca
ci.and.test
character string indicating whether confidence intervals and tests should be based on parametric or nonparametric (ci.and.test="nonparametric") methods. When ci.and.test="parametric" (the default), confidence intervals f
ci.arg.list
an optional list of arguments to pass to the function used to compute confidence intervals. The default value is ci.arg.list=NULL.
test.arg.list
an optional list of arguments to pass to the function used to test for group differences in location. The default value is test.arg.list=NULL. In particular, in the case when there are two groups, ci.and.test="parametric"
alternative
character string describing the alternative hypothesis for the test of group differences in the case when there are two groups. Possible values are "two.sided" (the default), "less", and "greater".

Value

  • stripChart invisibly returns a list with the following components:
  • group.centersnumeric vector of values on the group axis (the $x$-axis unless vertical=FALSE) indicating the centers of the groups.
  • group.statsa matrix with the number of rows equal to the number of groups and six columns indicating the sample size of the group (N), the estimate of the group location parameter (Mean or Median), the estimate of the group scale (SD or IQR), the lower confidence limit for the group location parameter (LCL), the upper confidence limit for the group location parameter (UCL), and the confidence level associated with the confidence interval (Conf.Level)
  • In addition, if the argument p.value=TRUE, the list also includes these components:
  • group.difference.p.valuenumeric scalar indicating the p-value associated with the test of equal group locations.
  • group.difference.conf.intnumeric vector of two elements indicating the confidence interval for the difference between the group locations. Only present when there are two groups.

References

Hollander, M., and D.A. Wolfe. (1999). Nonparametric Statistical Methods. Second Edition. John Wiley and Sons, New York. Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL. Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ.

See Also

stripchart, t.test, wilcox.test, aov, kruskal.test, t.test.

Examples

Run this code
# The guidance document USEPA (1994b, pp. 6.22--6.25) 
  # contains measures of 1,2,3,4-Tetrachlorobenzene (TcCB) 
  # concentrations (in parts per billion) from soil samples 
  # at a Reference area and a Cleanup area.  These data are strored 
  # in the data frame EPA.94b.tccb.df.  
  #
  # First create one-dimensional scatterplots to compare the 
  # TcCB concentrations between the areas and use a nonparametric 
  # test to test for a difference between areas.

  dev.new()
  stripChart(TcCB ~ Area, data = EPA.94b.tccb.df, 
    p.value = TRUE, ci.and.test = "nonparametric", 
    ylab = "TcCB (ppb)")

  #----------

  # Now log-transform the TcCB data and use a parametric test
  # to compare the areas.

  dev.new()
  stripChart(log10(TcCB) ~ Area, data = EPA.94b.tccb.df, 
    p.value = TRUE, ci.and.test = "parametric", 
    ylab = "log10 [ TcCB (ppb) ]")

  #----------

  # Repeat the above procedure, but allow the variances to differ.

  dev.new()
  stripChart(log10(TcCB) ~ Area, data = EPA.94b.tccb.df, 
    p.value = TRUE, ci.and.test = "parametric", 
    ylab = "log10 [ TcCB (ppb) ]", 
    test.arg.list = list(var.equal = FALSE))

  #----------

  # Repeat the above procedure, but jitter the points instead of 
  # stacking them.

  dev.new()
  stripChart(log10(TcCB) ~ Area, data = EPA.94b.tccb.df, 
    p.value = TRUE, ci.and.test = "parametric", 
    ylab = "log10 [ TcCB (ppb) ]", 
    test.arg.list = list(var.equal = FALSE), 
    method = "jitter", ci.offset = 4)

  #==========

  # Clean up
  #---------
  graphics.off()

Run the code above in your browser using DataLab