Usage
scat1d(x, side=3, frac=0.02, jitfrac=0.008, tfrac, eps=ifelse(preserve,0,.001), lwd=0.1, col=par("col"), y=NULL, curve=NULL, bottom.align=FALSE, preserve=FALSE, fill=1/3, limit=TRUE, nhistSpike=2000, nint=100, type=c('proportion','count','density'), grid=FALSE, ...)
jitter2(x, ...)
"jitter2"(x, fill=1/3, limit=TRUE, eps=0, presorted=FALSE, ...)
"jitter2"(x, ...)
datadensity(object, ...)
"datadensity"(object, group, which=c("all","continuous","categorical"), method.cat=c("bar","freq"), col.group=1:10, n.unique=10, show.na=TRUE, nint=1, naxes, q, bottom.align=nint>1, cex.axis=sc(.5,.3), cex.var=sc(.8,.3), lmgp=NULL, tck=sc(-.009,-.002), ranges=NULL, labels=NULL, ...)
# sc(a,b) means default to a if number of axes <= 3,="" b="" if="">=50, use
# linear interpolation within 3-50=>
histSpike(x, side=1, nint=100, frac=.05, minf=NULL, mult.width=1, type=c('proportion','count','density'), xlim=range(x), ylim=c(0,max(f)), xlab=deparse(substitute(x)), ylab=switch(type,proportion='Proportion', count ='Frequency', density ='Density'), y=NULL, curve=NULL, add=FALSE, bottom.align=type=='density', col=par('col'), lwd=par('lwd'), grid=FALSE, ...)
histSpikeg(formula=NULL, predictions=NULL, data, lowess=FALSE, xlim=NULL, ylim=NULL, side=1, nint=100, frac=function(f) 0.01 + 0.02*sqrt(f-1)/sqrt(max(f,2)-1), span=3/4, histcol='black')
Arguments
x
a vector of numeric data, or a data frame (for jitter2
)
object
a data frame or list (even with unequal number of observations per
variable, as long as group
is notspecified)
side
axis side to use (1=bottom (default for histSpike
), 2=left,
3=top (default for scat1d
), 4=right)
frac
fraction of smaller of vertical and horizontal axes for tick mark
lengths. Can be negative to move tick marks outside of plot. For
histSpike
, this is the relative y-direction length to be used for the
largest frequency. When scat1d
calls histSpike
, it
multiplies its frac
argument by 2.5. For histSpikeg
,
frac
is a function of f
, the vector of all frequencies. The
default function scales tick marks so that they are between 0.01 and
0.03 of the y range, linearly scaled in the square root of the
frequency less one.
jitfrac
fraction of axis for jittering. If
$\code{jitfrac} <= 0$,="" no="" jittering="" is="" done.="" if="" preserve=TRUE, the amount of
jittering is independent of jitfrac.
=>
tfrac
Fraction of tick mark to actually draw. If $\code{tfrac}<1$, will="" draw="" a="" random="" fraction="" tfrac of the line segment at
each point. This is useful for very large samples or ones with some
very dense points. The default value is 1 if the number of
non-missing observations n
is less than 125, and
$\max{(.1, 125/\var{n})}$ otherwise.
1$,>
eps
fraction of axis for determining overlapping points in x
. For
preserve=TRUE
the default is 0 and original unique values are
retained, bigger values of eps tends to bias observations from dense
to sparse regions, but ranks are still preserved.
lwd
line width for tick marks, passed to segments
col
color for tick marks, passed to segments
y
specify a vector the same length as x
to draw tick marks
along a curve instead of by one of the axes. The y
values
are often predicted values from a model. The side
argument
is ignored when y
is given. If the curve is already
represented as a table look-up, you may specify it using the
curve
argument instead. y
may be a scalar to use a
constant verticalplacement.
curve
a list containing elements x
and y
for which linear
interpolation is used to derive y
values corresponding to
values of x
. This results in tick marks being drawn along
the curve. For histSpike
, interpolated y
values are
derived for binmidpoints.
bottom.align
set to TRUE
to have the bottoms of tick marks (for
side=1
or side=3
) aligned at the y-coordinate. The
default behavior is to center the tick marks. For
datadensity.data.frame
, bottom.align
defaults to
TRUE
if nint>1
. In other words, if you are only
labeling the first and last axis tick mark, the scat1d
tick
marks are centered on the variable's axis.
preserve
set to TRUE
to invoke jitter2
fill
maximum fraction of the axis filled by jittered values. If d
are duplicated values between a lower value l and upper value
u, then d will be spread within
$
+/- \code{fill}*min(\var{u}-\var{d},\var{d}-\var{l})/2$.
limit
specifies a limit for maximum shift in jittered values. Duplicate
values will be spread within
$
+/- \code{fill}*min(\var{u}-\var{d},\var{d}-\var{l})/2$. The
default TRUE
restricts jittering to the smallest
$\min{(\var{u}-\var{d},\var{d}-\var{l})}/2$ observed and results
in equal amount of jittering for all d. Setting to
FALSE
allows for locally different amount of jittering, using
maximum space available.
nhistSpike
If the number of observations exceeds or equals nhistSpike
,
scat1d
will automatically call histSpike
to draw the
data density, to prevent the graphics file from being too large.
type
used by or passed to histSpike
. Set to "count"
to
display frequency counts rather than relative frequencies, or
"density"
to display a kernel density estimate computed using
the density
function.
grid
set to TRUE
if the R grid
package is in effect for
the current plot
nint
number of intervals to divide each continuous variable's axis for
datadensity
. For histSpike
, is the number of
equal-width intervals for which to bin x
, and if instead
nint
is a character string (e.g.,nint="all"
), the
frequency tabulation is done with no binning. In other words,
frequencies for all unique values of x
are derived and
plotted. For histSpikeg
, if x
has no more than
nint
unique values, all observed values are used, otherwise
the data are rounded before tabulation so that there are no more
than nint
intervals.
...
optional arguments passed to scat1d
from datadensity
or to histSpike
from scat1d
presorted
set to TRUE
to prevent from sorting for determining the order
$\var{l}
group
an optional stratification variable, which is converted to a
factor
vector if it is not one already
which
set which="continuous"
to only plot continuous variables, or
which="categorical"
to only plot categorical, character, or
discrete numeric ones. By default, all types of variables are
depicted.
method.cat
set method.cat="freq"
to depict frequencies of categorical
variables with digits representing the cell frequencies, with size
proportional to the square root of the frequency. By default,
vertical bars are used.
col.group
colors representing the group
strata. The vector of colors
is recycled to be the same length as the levels of group
.
n.unique
number of unique values a numeric variable must have before it is
considered to be a continuous variable
show.na
set to FALSE
to suppress drawing the number of NA
s to
the right of each axis
naxes
number of axes to draw on each page before starting a new plot. You
can set naxes
larger than the number of variables in the data
frame if you want to compress the plot vertically.
q
a vector of quantiles to display. By default, quantiles are not
shown.
cex.axis
character size for draw labels for axis tick marks
cex.var
character size for variable names and frequence of NA
s
lmgp
spacing between numeric axis labels and axis (see par
for
mgp
)
ranges
a list containing ranges for some or all of the numeric variables.
If ranges
is not given or if a certain variable is not found
in the list, the empirical range, modified by pretty
, is
used. Example:
ranges=list(age=c(10,100), pressure=c(50,150))
.
labels
a vector of labels to use in labeling the axes for
datadensity.data.frame
. Default is to use the names of the
variable in the input data frame. Note: margin widths computed for
setting aside names of variables use the names, and not these
labels.
minf
For histSpike
, if minf
is specified low bin
frequencies are set to a minimum value of minf
times the
maximum bin frequency, so that rare data points will remain visible.
A good choice of minf
is 0.075.
datadensity.data.frame
passes minf=0.075
to
scat1d
to pass to histSpike
. Note that specifying
minf
will cause the shape of the histogram to be distorted
somewhat.
mult.width
multiplier for the smoothing window width computed by
histSpike
when type="density"
xlim
a 2-vector specifying the outer limits of x
for binning (and
plotting, if add=FALSE
and nint
is a number). For
histSpikeg
, observations outside the xlim
range are ignored.
ylim
y-axis range for plotting (if add=FALSE
). Often needed for
histSpikeg
to help scale the tick mark line segments.
xlab
x-axis label (add=FALSE
); default is name of input argument
x
ylab
y-axis label (add=FALSE
)
add
set to TRUE
to add the spike-histogram to an existing plot,
to show marginal data densities
formula
a formula of the form y ~ x1
or y ~ x1 + ...
where
y
is the name of the y
-axis variable being plotted
with ggplot
, x1
is the name of the x
-axis
variable, and optional ... are variables used by
ggplot
to produce multiple curves on a panel and/or facets.
predictions
the data frame being plotted by ggplot
, containing x
and y
coordinates of curves. If omitted, spike histograms
are drawn at the bottom (default) or top of the plot according to
side
.
data
for histSpikeg
is a mandatory data frame containing raw data whose
frequency distribution is to be summarized, using variables in
formula
.
lowess
set to TRUE
to have histSpikeg
add a geom_line
layer to the ggplot2
graphic, containing
lowess()
nonparametric smoothers. This causes the
returned value of histSpikeg
to be a list with two
components: "hist"
and "lowess"
each containing
a layer. Fortunately, ggplot2
plots both layers
automatically. If the dependent variable is binary,
iter=0
is passed to lowess
so that outlier
detection is turned off; otherwise iter=3
is passed.
span
passed to lowess
as the f
argument
histcol
color of line segments (tick marks) for
histSpikeg
. Default is black. Set to any color or to
"default"
to use the prevailing colors for the
graphic.