A 2-dimensional table will be described with it's relative frequencies, a short summary containing the total cases, the dimensions of the table, chi-square tests and some association measures as phi-coefficient, contingency coefficient and Cramer's V. Tables with higher dimensions will simply be printed as flat table, with marginal sums for the first and for the last dimension.
Desc(x, ..., main = NULL, plotit = NULL, wrd = NULL)
"Desc"(x, main = NULL, maxrows = NULL, ord = NULL, conf.level = 0.95, verbose = 2, rfrq = "111", margins = c(1,2), dprobs = NULL, mprobs = NULL, plotit = NULL, sep = NULL, digits = NULL, ...)
"Desc"(x, main = NULL, plotit = NULL, enum = TRUE, ...)
"Desc"(x, main = NULL, plotit = NULL, enum = TRUE, ...)
"Desc"(x, main = NULL, maxrows = NULL, plotit = NULL, sep = NULL, digits = NULL, ...)
"Desc"(x, main = NULL, maxrows = NULL, plotit = NULL, sep = NULL, digits = NULL, ...)
"Desc"(x, main = NULL, maxrows = NULL, ord = NULL, plotit = NULL, sep = NULL, digits = NULL, ...)
"Desc"(x, main = NULL, maxrows = NULL, ord = NULL, plotit = NULL, sep = NULL, digits = NULL, ...)
"Desc"(x, main = NULL, maxrows = NULL, ord = NULL, plotit = NULL, sep = NULL, digits = NULL, ...)
"Desc"(x, main = NULL, ord = NULL, conf.level = 0.95, plotit = NULL, sep = NULL, digits = NULL, ...)
"Desc"(x, main = NULL, dprobs = NULL, mprobs = NULL, plotit = NULL, sep = NULL, digits = NULL, ...)
"Desc"(x, main = NULL, conf.level = 0.95, verbose = 2, rfrq = "111", margins = c(1,2), plotit = NULL, sep = NULL, digits = NULL, ...)
"Desc"(formula, data = parent.frame(), subset, main = NULL, plotit = NULL, digits = NULL, ...)
"print"(x, digits = NULL, plotit = NULL, nolabel = FALSE, sep = NULL, ...)
"plot"(x, main = NULL, ...)
NULL
, the title will be composed as: variablename (class(es)),
resp. number - variablename (class(es)) if the enum
option is set to TRUE.
Use NA
if no caption should be printed at all.
GetNewWrd()
(for a new one)
or by GetCurrWrd()
for an existing one.
All output will then be redirected there. Default is NULL
, which will report all results to the console.options(digits=x)
. ord
is set to levels
or names
). If for numeric object the value is left to its default NULL
, the list of extreme values will be displayed, when x has more than 12 single values and the frequency table else.
If maxrows is < 1 it will be interpreted as percentage. Then just as many rows, as the maxrows%
most frequent factors will be shown. Say, if maxrows is set to 0.8, then the number of rows is fixed so, that the highest cumulative relative frequency is the first one going beyond 0.8. If the highest and the lowest values (numeric objects only) should always be reported, maxrows
should be set to0
.
"name"
(alphabetical order), "level"
, "asc"
(by frequencies ascending), "desc"
(by freqencies descending) defining the order for a frequency table as used for factors, numerics with few unique values and logicals.
Factors (and character vectors) are by default orderd by their descending frequencies, ordered factors by their natural order.
1
or 0
, defining which percentages should be reported. The first position is interpreted as total
percentages, the second as row percentages and the third as column percentages.
"011
" hence produces a table output with row and column percentages. If set to NULL
rfrq
is defined in
dependency of verbose
(verbose = "low"
sets rfrq
to "000"
and else to "111"
, latter meaning all percentages will be reported.) Applies only to tables and is ignored else.NULL
(none). Applies only to tables and is ignored else.
c(2, 1, 3)
defining the verbosity of the reported results. 2 (default) means medium, 1 less and 3 extensive results. Applies only to tables and is ignored else.
NA
no confidence interval will be calculated. Default is 0.95.
Date
variable.
If this is left to NULL
(default) then a uniform distribution
will be used for days and a monthdays distribution in a non leap year (p = c(31/365, 28/365, 31/365, ...)) for the months. Applies only to Dates
and is ignored else.plotit
,
if it does not exist then it's set to FALSE
.
"-"
for the current width of the screen
(options("width"))
will be used.
label
, as done by Label
) should be plotted.
lhs ~ rhs
where lhs
gives the data values and rhs the corresponding groups.formula
.
By default the variables are taken from environment(formula)
.x
. An error is given if any entry of p
is negative. This argument will be passed on to chisq.test
. Default is rep(1/length(x), length(x))
.
add_ni
smooth
MeanSE
.quantile(x, probs = c(.05,.10,.25,.5,.75,.9,.95), na.rm = TRUE)
.mean(x)
/ sd(x)
mad
) Skew
. Kurt
.Freq
if maxlevels > unique values in the vector.This function produces a rich description of a factor, containing length, number of NAs, number of levels and
detailed frequencies of all levels.
The order of the frequency table can be chosen between descending/ascending frequency, labels or levels.
For ordered factors the order default is "level"
.
Character vectors are treated as unordered factors
Desc.char converts x to a factor an processes x as factor.
Desc.ordered does nothing more than changing the standard order for the frequencies to it's intrinsic order, which means order "level"
instead of "desc"
in the factor case.
Description interface for dates. We do here what seems reasonable for describing dates. We start with a short summary about length, number of NAs and extreme values, before we describe the frequencies of the weekdays and months, rounded up by a chi-square test.
A 2-dimensional table will be described with it's relative frequencies, a short summary containing the total cases, the dimensions of the table, chi-square tests and some association measures as phi-coefficient, contingency coefficient and Cramer's V. Tables with higher dimensions will simply be printed as flat table, with marginal sums for the first and for the last dimension.
Note that NAs cannot be handled by this interface, as tables in general come in "as.is", say basically as a matrix without any further information about potentially previously cleared NAs.
Description of a dichotomous variable. This can either be a boolean vector, a factor with two levels or a numeric variable
with only two unique values.
The confidence levels for the relative frequencies are calculated by BinomCI
, method "Wilson"
on a confidence level defined by conf.level
.
Dichotomous variables can easily be condensed in one graphical representation. Desc for a set of flags (=dichotomous variables) calculates the frequencies, a binomial confidence intervall and produces a kind of dotplot with error bars.
Motivation for this function is, that dichotomous variable in general do not contain intense information. Therefore it makes sense to condense the description of sets of dichotomous variables.
The formula interface accepts the formula operators +
, :
, *
, I()
, 1
and evaluates any function.
The left hand side and right hand side of the formula are evaluated the same way.
The variable pairs are processed in dependency of their classes.
Word This function is not thought of being directly run by the enduser. It will normally be called automatically, when
a pointer to a Word instance is passed to the function Desc
.
However DescWrd
takes some more specific arguments concerning the Word output (like font or fontsize), which can make it necessary to call the function directly.
summary
, plot
# implemented classes:
Desc(d.pizza$wrongpizza) # logical
Desc(d.pizza$driver) # factor
Desc(d.pizza$quality) # ordered factor
Desc(as.character(d.pizza$driver)) # character
Desc(d.pizza$week) # integer
Desc(d.pizza$delivery_min) # numeric
Desc(d.pizza$date) # Date
Desc(d.pizza)
Desc(d.pizza$wrongpizza, main="The wrong pizza delivered", digits=5)
Desc(table(d.pizza$area)) # 1-dim table
Desc(table(d.pizza$area, d.pizza$operator)) # 2-dim table
Desc(table(d.pizza$area, d.pizza$operator, d.pizza$driver)) # n-dim table
# expressions
Desc(log(d.pizza$temperature))
Desc(d.pizza$temperature > 45)
# supported labels
Label(d.pizza$temperature) <- "This is the temperature in degrees Celsius
measured at the time when the pizza is delivered to the client."
Desc(d.pizza$temperature)
# try as well: Desc(d.pizza$temperature, wrd=GetNewWrd())
z <- Desc(d.pizza$temperature)
print(z, digits=1, plotit=FALSE)
# plot (additional arguments are passed on to the underlying plot function)
plot(z, main="The pizza's temperature in Celsius", args.hist=list(breaks=50))
# bivariate
Desc(price ~ operator, data=d.pizza) # numeric ~ factor
Desc(driver ~ operator, data=d.pizza) # factor ~ factor
Desc(driver ~ area + operator, data=d.pizza) # factor ~ several factors
Desc(driver + area ~ operator, data=d.pizza) # several factors ~ factor
Desc(driver ~ week, data=d.pizza) # factor ~ integer
Desc(driver ~ operator, data=d.pizza, rfrq=("111")) # alle rel. frequencies
Desc(driver ~ operator, data=d.pizza, rfrq=("000"),
verbose="high") # no rel. frequencies
Desc(price ~ delivery_min, data=d.pizza) # numeric ~ numeric
Desc(price + delivery_min ~ operator + driver + wrongpizza,
data=d.pizza, digits=c(2,2,2,2,0,3,0,0) )
Desc(week ~ driver, data=d.pizza, digits=c(2,2,2,2,0,3,0,0)) # define digits
Desc(delivery_min + weekday ~ driver, data=d.pizza)
# without defining data-parameter
Desc(d.pizza$delivery_min ~ d.pizza$driver)
# with functions and interactions
Desc(sqrt(price) ~ operator : factor(wrongpizza), data=d.pizza)
Desc(log(price+1) ~ cut(delivery_min, breaks=seq(10,90,10)),
data=d.pizza, digits=c(2,2,2,2,0,3,0,0))
# response versus all the rest
Desc(driver ~ ., data=d.pizza[, c("temperature","wine_delivered","area","driver")])
# all the rest versus response
Desc(. ~ driver, data=d.pizza[, c("temperature","wine_delivered","area","driver")])
# pairwise Descriptions
p <- CombPairs(c("area","count","operator","driver","temperature","wrongpizza","quality"), )
for(i in 1:nrow(p))
print(Desc(formula(gettextf("%s ~ %s", p$X1, p$X2)), data=d.pizza))
# get more flexibility, create the table first
tab <- as.table(apply(HairEyeColor, c(1,2), sum))
tab <- tab[,c("Brown","Hazel","Green","Blue")]
# diplay only absolute values, row and columnwise percentages
Desc(tab, row.vars=c(3, 1), rfrq="011", plotit=FALSE)
# do the plot by hand, while setting the colours for the mosaics
cols1 <- SetAlpha(c("sienna4", "burlywood", "chartreuse3", "slategray1"), 0.6)
cols2 <- SetAlpha(c("moccasin", "salmon1", "wheat3", "gray32"), 0.8)
plot(tab, col1=cols1, col2=cols2)
# use global format options for presentation
options(fmt.abs=structure(list(digits=0, big.mark=""), class="fmt"))
options(fmt.per=structure(list(digits=2, fmt="%"), class="fmt"))
Desc(area ~ driver, d.pizza, plotit=FALSE)
options(fmt.abs=structure(list(digits=0, big.mark="'"), class="fmt"))
options(fmt.per=structure(list(digits=3, leading="drop"), class="fmt"))
Desc(area ~ driver, d.pizza, plotit=FALSE)
# plot arguments can be fixed in detail
z <- Desc(BoxCox(d.pizza$temperature, lambda = 1.5))
plot(z, mar=c(0, 2.1, 4.1, 2.1), args.rug=TRUE, args.hist=list(breaks=50),
args.dens=list(from=0))
# Output into word document (Windows-specific example) -----------------------
# by simply setting wrd=GetNewWrd()
## Not run:
#
# # create a new word instance and insert title and contents
# wrd <- GetNewWrd(header=TRUE)
#
# # let's have a subset
# d.sub <- d.pizza[,c("driver", "date", "operator", "price", "wrongpizza")]
#
# # do just the univariate analysis
# Desc(d.sub, wrd=wrd)
# ## End(Not run)
Run the code above in your browser using DataLab