Learn R Programming

FSA (version 0.8.6)

Summarize: Summary statistics for a numeric or factor variable.

Description

Summary statistics for a single numeric or factor variable, possibly separated by the levels of a factor variable. This function is very similar to summary for a numeric variables and table for factor variables.

Usage

Summarize(object, ...)

## S3 method for class 'default':
Summarize(object, digits = getOption("digits"),
  addtotal = TRUE, percent = c("total", "none"), percdigs = 2,
  na.rm = TRUE, exclude = "", ...)

## S3 method for class 'formula':
Summarize(object, data = NULL,
  digits = getOption("digits"), percent = c("row", "column", "total",
  "none"), percdigs = 2, addtotal = TRUE, na.rm = TRUE, exclude = "",
  ...)

Arguments

object
A vector of numeric or factor data.
digits
A numeric that indicates the number of decimals to round the numeric summaries.
addtotal
A logical that indicates whether totals should be added to tables (=TRUE, default) or not. See details.
percent
A string that indicates the type of percents to compute for tables from factor variables. See details.
percdigs
A numeric that indicates the number of decimals to round the percentage summaries.
na.rm
A logical that indicates whether numeric missing values (NA) should be removed (=TRUE, default) or not.
exclude
A string that contains the level that should be excluded from a factor variable.
data
An optional data frame that contains the variables in formula.
...
Not implemented.

Value

  • A named vector or data frame (when a quantitative variable is separted by one or two factor variables) of summary statistics for numeric data and a matrix of frequencies and, possibly, percentages for factor variables.

Details

This function is primarily used with formulas. Five general types of formulae may be used (where quant and factor generically represent quantitative/numeric and factor variables, respectively) ll{ Formula Description of Summary ~quant Numerical summaries (see below) of quant. ~factor One-way frequency or percentage (see below) table of factor. quant~factor Summaries of quant separated by levels in factor. quant~factor1*factor2 Summaries of quant separated by the combined levels in factor1 and factor2. factor1~factor2 Two-way frequency or percentage table with levels of factor2 as rows and factor1 as columns. } Numerical summaries include all results from summary (min, Q1, mean, median, Q3, and max) and the sample size, valid sample size (sample size minus number of NAs), and standard deviation (i.e., sd). NA values are removed from the calculations with na.rm=TRUE (the DEFAULT). The number of digits in the returned results are controlled with digits=. Factor variables may be summarized as a frequency (if percent="none") or percentages table (the DEFAULT). For a single factor variable, the percentages table is returned if percent="total". For two factor variables, the percentage table may be returned as a row-, column-, or table-percent table with percent="row" (the DEFAULT), percent="column", and percent="total", respectively. The number of digits in the returned table are controlled with percdigs=. A marginal total, either for all margins if percent="none" or the appropriate margin otherwise, is added to the table if addtotal=TRUE. The results for a factor are NOT meant to replace table or xtabs. This functionality is provided to make this function more complete.

See Also

See summary, table, and xtabs for related one dimensional functionality. See tapply, summaryBy in doBy, describe in psych, describe in prettyR, and basicStats in fBasics for similar by functionality.

Examples

Run this code
## Create a numeric vector (with missing values)
n <- 102
y <- c(0,0,NA,NA,NA,runif(n-5))
## Create a factor vector (with missing values)
g1 <- factor(sample(c("A","B","C","NA"),n,replace=TRUE))
## Create a factor vector with unknowns
g2 <- factor(sample(c("male","female","UNKNOWN"),n,replace=TRUE))
# Put into a data.frame (with some extra variables)
d <- data.frame(dy=y,dg1=g1,dg2=g2,
                dw=sample(1:3,n,replace=TRUE),
                dv=sample(1:3,n,replace=TRUE))

# typical output of summary() for a numeric variable
summary(y)   

# this function           
Summarize(y,digits=3)
Summarize(~dy,data=d,digits=3)
Summarize(dy~1,data=d,digits=3)

## Factor vector (excluding "NA"s in second call)
Summarize(~dg1,data=d)
Summarize(~dg1,data=d,exclude="NA")

## Factor vector with UNKNOWNs
Summarize(~dg2,data=d)
Summarize(~dg2,data=d,exclude="UNKNOWN")

## Numeric vector by levels of a factor variable
Summarize(dy~dg1,data=d,digits=3)
Summarize(dy~dg1,data=d,digits=3,exclude="NA")
Summarize(dy~dg2,data=d,digits=3)
Summarize(dy~dg2,data=d,digits=3,exclude="UNKNOWN")

## What happens if RHS of formula is not a factor
Summarize(dy~dw,data=d,digits=3)
Summarize(y~dw*dv,data=d,digits=3)

## Summarize factor variable by a factor variable
Summarize(dg1~dg2,data=d)
Summarize(dg1~dg2,data=d,exclude="NA")
Summarize(dg1~dg2,data=d,exclude=c("NA","UNKNOWN"))
Summarize(dg1~dg2,data=d,percent="none")
Summarize(dg1~dg2,data=d,percent="column")
Summarize(dg1~dg2,data=d,percent="total")

## Summarizing all variables in a data frame
lapply(as.list(d),Summarize,digits=4)

Run the code above in your browser using DataLab