Usage
scat1d(x, side=3, frac=0.02, jitfrac=0.008, tfrac,
eps=ifelse(preserve,0,.001),
lwd=0.1, col=par("col"),
y=NULL, curve=NULL,
bottom.align=type=='density',
preserve=FALSE, fill=1/3, limit=TRUE, nhistSpike=2000, nint=100,
type=c('proportion','count','density'), grid=FALSE, ...)jitter2(x, ...)
## S3 method for class 'default':
jitter2(x, fill=1/3, limit=TRUE, eps=0, presorted=FALSE, ...)
## S3 method for class 'data.frame':
jitter2(x, ...)
datadensity(object, ...)
## S3 method for class 'data.frame':
datadensity(object, group,
which=c("all","continuous","categorical"),
method.cat=c("bar","freq"),
col.group=1:10,
n.unique=10, show.na=TRUE, nint=1, naxes,
q, bottom.align=nint>1,
cex.axis=sc(.5,.3), cex.var=sc(.8,.3),
lmgp=sc(-.2,-.625), tck=sc(-.009,-.002), ranges, labels, ...)
# sc(a,b) means default to a if number of axes <= 3,="" b="" if="">=50, use
# linear interpolation within 3-50=>
histSpike(x, side=1, nint=100, frac=.05, minf=NULL, mult.width=1,
type=c('proportion','count','density'),
xlim=range(x), ylim=c(0,max(f)), xlab=deparse(substitute(x)),
ylab=switch(type,proportion='Proportion',
count ='Frequency',
density ='Density'),
y=NULL, curve=NULL, add=FALSE,
bottom.align=type=='density', col=par('col'), lwd=par('lwd'),
grid=FALSE, ...)
Arguments
x
a vector of numeric data, or a data frame (for jitter2
)
object
a data frame or list (even with unequal number of observations per
variable, as long as group
is not specified)
side
axis side to use (1=bottom (default for histSpike
), 2=left,
3=top (default for scat1d
), 4=right)
frac
fraction of smaller of vertical and horizontal axes for tick mark lengths.
Can be negative to move tick marks outside of plot. For histSpike
,
this is the relative length to be used for the largest frequency.
When scat1d
calls
jitfrac
fraction of axis for jittering. If <=0, no="" jittering="" is="" done.="" if="" preserve=TRUE, the amount of jittering is independent of jitfrac.=0,>
tfrac
fraction of tick mark to actually draw. If tfrac<1< code="">,
will draw a random fraction tfrac
of the line segment at each point.
This is useful for very large samples or ones with some very dense points.
The default value is 1 if the nu1<>
eps
fraction of axis for determining overlapping points in x
. For
preserve=TRUE
the default is 0 and original unique values are
retained, bigger values of eps tends to bias observations from dense
to sparse regions, but ranks are sti
lwd
line width for tick marks, passed to segments
col
color for tick marks, passed to segments
y
specify a vector the same length as x
to draw tick marks along
a curve instead of by one of the axes. The y
values are often
predicted values from a model. The side
argument is ignored
when y
is given.
curve
a list containing elements x
and y
for which linear interpolation
is used to derive y
values corresponding to values of x
. This
results in tick marks being drawn along the curve. For histSpike
bottom.align
set to TRUE
to have the bottoms of tick marks (for side=1
or
side=3
) aligned at the y-coordinate. The default behavior is to
center the tick marks. For datadensity.data.frame
, bottom.align
preserve
set to TRUE
to invoke jitter2
fill
maximum fraction of the axis filled by jittered values. If d
are
duplicated values between a lower value l
and upper value u
, then
d
will be spread within +/- fill*min(u-d,d-l)/2
.
limit
specifies a limit for maximum shift in jittered values. Duplicate
values will be spread within +/- fill*min(limit,min(u-d,d-l)/2)
. The
default TRUE
restricts jittering to the smallest min(u-d,d-l)/2 observed and
results in equal
nhistSpike
If the number of observations exceeds or equals nhistSpike
, scat1d
will automatically call histSpike
to draw the data density, to
prevent the graphics file from being too large.
type
used by or passed to histSpike
. Set to "count"
to display
frequency counts rather than relative frequencies, or "density"
to
display a kernel density estimate computed using the density
function.
grid
set to TRUE
if the Rgrid
package is in effect for the
current plot
nint
number of intervals to divide each continuous variable's axis for
datadensity
.
For histSpike
, is the number of equal-width intervals for which to
bin x
, and if instead nint
is a character string (e.g.,
...
optional arguments passed to scat1d
from datadensity
or to
histSpike
from scat1d
presorted
set to TRUE
to prevent from sorting for determining the order l
group
an optional stratification variable, which is converted to a factor
vector if it is not one already
which
set which="continuous"
to only plot continuous variables, or
which="categorical"
to only plot categorical, character, or discrete
numeric ones. By default, all types of variables are depicted.
method.cat
set method.cat="freq"
to depict frequencies of categorical variables
with digits representing the cell frequencies, with size proportional
to the square root of the frequency. By default, vertical bars are used.
col.group
colors representing the group
strata. The vector of colors is
recycled to be the same length as the levels of group
.
n.unique
number of unique values a numeric variable must have before it is
considered to be a continuous variable
show.na
set to FALSE
to suppress drawing the number of NA
s to the right of
each axis
naxes
number of axes to draw on each page before starting a new plot. You
can set naxes
larger than the number of variables in the data frame
if you want to compress the plot vertically.
q
a vector of quantiles to display. By default, quantiles are not shown.
cex.axis
character size for draw labels for axis tick marks
cex.var
character size for variable names and frequence of NA
s
lmgp
spacing between numeric axis labels and axis (see par
for mgp
)
ranges
a list containing ranges for some or all of the numeric variables. If
ranges
is not given or if a certain variable is not found in the
list, the empirical range, modified by pretty
, is used. Example:
ranges=list(age=c(10,
labels
a vector of labels to use in labeling the axes for
datadensity.data.frame
. Default is to use the names of the
variables in the input data frame. Note: margin widths computed for
setting aside names of variables use the names, and not these
minf
For histSpike
, if minf
is specified low bin frequencies are set to
a minimum value of minf
times the maximum bin frequency, so that
rare data points will remain visible. A good choice of minf
is
0.075.
mult.width
multiplier for the smoothing window width computed by histSpike
when
type="density"
xlim
a 2-vector specifying the outer limits of x
for binning (and
plotting, if add=FALSE
and nint
is a number)
ylim
y
-axis range for plotting (if add=FALSE
)
xlab
x
-axis label (add=FALSE
); default is name of input argument x
ylab
y
-axis label (add=FALSE
)
add
set to TRUE
to add the spike-histogram to an existing plot, to show
marginal data densities