# describe

##### Concise Statistical Description of a Vector, Matrix, Data Frame, or Formula

`describe`

is a generic method that invokes `describe.data.frame`

,
`describe.matrix`

, `describe.vector`

, or
`describe.formula`

. `describe.vector`

is the basic
function for handling a single variable.
This function determines whether the variable is character, factor,
category, binary, discrete numeric, and continuous numeric, and prints
a concise statistical summary according to each. A numeric variable is
deemed discrete if it has <= 5="" 10="" 20="" unique="" values.="" in="" this="" case,="" quantiles="" are="" not="" printed.="" a="" frequency="" table="" is="" printed="" for="" any="" non-binary="" variable="" if="" it="" has="" no="" more="" than="" with="" at="" least="" values,="" the="" lowest="" and="" highest="" values="" ```
describe is especially useful for
describing data frames created by
```

`sas.get`

, as SAS labels, formats,
value labels, and frequencies of special missing values are printed.

For a binary variable, the sum (number of 1's) and mean (proportion of
1's) are printed. If the first argument is a formula, a model frame
is created and passed to describe.data.frame. If a variable
is of class `"impute"`

, a count of the number of imputed values is
printed. If a date variable has an attribute `partial.date`

(this is set up by `sas.get`

), counts of how many partial dates are
actually present (missing month, missing day, missing both) are also presented.
If a variable was created by the special-purpose function `substi`

(which
substitutes values of a second variable if the first variable is NA),
the frequency table of substitutions is also printed.

A latex method
exists for converting the `describe`

object to a LaTeX file. For
numeric variables having at least 20 unique values, `describe`

saves
in its returned object the frequencies of 100 evenly spaced bins
running from minimum observed value to the maximum. `latex`

inserts a
spike histogram displaying these frequency counts in the tabular
material using the LaTeX picture environment. For example output see

Sample weights may be specified to any of the functions, resulting in weighted means, quantiles, and frequency tables.

- Keywords
- robust, models, distribution, nonparametric, interface, category

##### Usage

```
## S3 method for class 'vector':
describe(x, descript, exclude.missing=TRUE, digits=4,
weights, normwt, \dots)
## S3 method for class 'matrix':
describe(x, descript, exclude.missing=TRUE, digits=4, \dots)
## S3 method for class 'data.frame':
describe(x, descript, exclude.missing=TRUE,
digits=4, \dots)
## S3 method for class 'formula':
describe(x, descript, data, subset, na.action,
digits=4, weights, \dots)
## S3 method for class 'describe':
print(x, condense=TRUE, \dots)
## S3 method for class 'describe':
latex(object, title=NULL, condense=TRUE,
file=paste('describe',first.word(expr=attr(object,'descript')),'tex',sep='.'),
append=FALSE, size='small', tabular=TRUE, greek=TRUE, ...)
## S3 method for class 'describe.single':
latex(object, title=NULL, condense=TRUE, vname,
file, append=FALSE, size='small', tabular=TRUE, greek=TRUE, \dots)
```

##### Arguments

- x
- a data frame, matrix, vector, or formula. For a data frame, the
`describe.data.frame`

function is automatically invoked. For a matrix,`describe.matrix`

is called. For a formula, describe.data.frame(model.frame(x)) is inv - descript
- optional title to print for x. The default is the name of the argument
or the "label" attributes of individual variables. When the first argument
is a formula,
`descript`

defaults to a character representation of the formula. - exclude.missing
- set toTRUE to print the names of variables that contain only missing values. This list appears at the bottom of the printout, and no space is taken up for such variables in the main listing.
- digits
- number of significant digits to print
- weights
- a numeric vector of frequencies or sample weights. Each observation
will be treated as if it were sampled
`weights`

times. - normwt
- The default,
`normwt=FALSE`

results in the use of`weights`

as weights in computing various statistics. In this case the sample size is assumed to be equal to the sum of`weights`

. Specify`normwt=TRUE`

- object
- a result of
`describe`

- title
- unused
- condense
- default isTRUE to condense the output with regard to the 5 lowest and highest values and the frequency table
- data
- subset
- na.action
- There are used if a formula is specified.
`na.action`

defaults to`na.retain`

which does not delete any`NA`

s from the data frame. Use`na.action=na.omit`

or`na.delete`

to drop any observation w - ...
- arguments passed to
`describe.default`

which are passed to calls to`format`

for numeric variables. For example if using R`POSIXct`

or`Date`

date/time formats, specifying`describe(d,format='%d%b%y`

- file
- name of output file (should have a suffix of .tex). Default name is
formed from the first word of the
`descript`

element of the`describe`

object, prefixed by`"describe"`

. Set`file=""`

to send LaTeX code to - append
- set to
`TRUE`

to have`latex`

append text to an existing file named`file`

- size
- LaTeX text size (
`"small"`

, the default, or`"normalsize"`

,`"tiny"`

,`"scriptsize"`

, etc.) for the`describe`

output in LaTeX. - tabular
- set to
`FALSE`

to use verbatim rather than tabular environment for the summary statistics output. By default, tabular is used if the output is not too wide. - greek
- By default, the
`latex`

methods will change LaTeX names of greek letters that appear in variable labels to appropriate LaTeX symbols in math mode unless`greek=FALSE`

.`greek=TRUE`

is not implemented in S-Plus ver - vname
- unused argument in
`latex.describe.single`

##### Details

If `options(na.detail.response=TRUE)`

has been set and `na.action`

is `"na.delete"`

or
`"na.keep"`

, summary statistics on
the response variable are printed separately for missing and non-missing
values of each predictor. The default summary function returns
the number of non-missing response values and the mean of the last
column of the response values, with a `names`

attribute of `c("N","Mean")`

.
When the response is a `Surv`

object and the mean is used, this will
result in the crude proportion of events being used to summarize
the response. The actual summary function can be designated through
`options(na.fun.response = "function name")`

.

##### Value

- a list containing elements
`descript`

,`counts`

,`values`

. The list is of class`describe`

. If the input object was a matrix or a data frame, the list is a list of lists, one list for each variable analyzed.`latex`

returns a standard`latex`

object. For numeric variables having at least 20 unique values, an additional component`intervalFreq`

. This component is a list with two elements,`range`

(containing two values) and`count`

, a vector of 100 integer frequency counts.

##### See Also

`sas.get`

, `quantile`

, `table`

, `summary`

, `model.frame.default`

,
`naprint`

, `lapply`

, `tapply`

, `Surv`

, `na.delete`

, `na.keep`

,
`na.detail.response`

, `latex`

##### Examples

```
set.seed(1)
describe(runif(200),dig=2) #single variable, continuous
#get quantiles .05,.10,\dots
dfr <- data.frame(x=rnorm(400),y=sample(c('male','female'),400,TRUE))
describe(dfr)
d <- sas.get(".","mydata",special.miss=TRUE,recode=TRUE)
describe(d) #describe entire data frame
attach(d, 1)
describe(relig) #Has special missing values .D .F .M .R .T
#attr(relig,"label") is "Religious preference"
#relig : Religious preference Format:relig
# n missing D F M R T unique
# 4038 263 45 33 7 2 1 8
#
#0:none (251, 6%), 1:Jewish (372, 9%), 2:Catholic (1230, 30%)
#3:Jehovah's Witnes (25, 1%), 4:Christ Scientist (7, 0%)
#5:Seventh Day Adv (17, 0%), 6:Protestant (2025, 50%), 7:other (111, 3%)
# Method for describing part of a data frame:
describe(death.time ~ age*sex + rcs(blood.pressure))
describe(~ age+sex)
describe(~ age+sex, weights=freqs) # weighted analysis
fit <- lrm(y ~ age*sex + log(height))
describe(formula(fit))
describe(y ~ age*sex, na.action=na.delete)
# report on number deleted for each variable
options(na.detail.response=TRUE)
# keep missings separately for each x, report on dist of y by x=NA
describe(y ~ age*sex)
options(na.fun.response="quantile")
describe(y ~ age*sex) # same but use quantiles of y by x=NA
d <- describe(my.data.frame)
d$age # print description for just age
d[c('age','sex')] # print description for two variables
d[sort(names(d))] # print in alphabetic order by var. names
d2 <- d[20:30] # keep variables 20-30
page(d2) # pop-up window for these variables
# Test date/time formats and suppression of times when they don't vary
library(chron)
d <- data.frame(a=chron((1:20)+.1),
b=chron((1:20)+(1:20)/100),
d=ISOdatetime(year=rep(2003,20),month=rep(4,20),day=1:20,
hour=rep(11,20),min=rep(17,20),sec=rep(11,20)),
f=ISOdatetime(year=rep(2003,20),month=rep(4,20),day=1:20,
hour=1:20,min=1:20,sec=1:20),
g=ISOdate(year=2001:2020,month=rep(3,20),day=1:20))
describe(d)
```

*Documentation reproduced from package Hmisc, version 3.0-10, License: GPL version 2 or newer*