freqweights (version 1.0.1)

tablefreq: Create a table of frequencies

Description

Create a table of frequencies

Usage

tablefreq(tbl, vars = NULL, freq = NULL)

## S3 method for class 'tablefreq': update(object, ...)

Arguments

tbl
an object that can be coerced to a tbl. It must contain all variables in vars and in freq
vars
variables to count unique values of. It may be a character vector
freq
a name of a variable of the tbl object specifying frequency weights. See Details
object
a tablefreq object
...
more data

Value

  • A tbl object a with label and freq columns. When it is possible, the last column is named freq and it represents the frequency counts of the cases. This object of class tablefreq, has two attributes:
  • freqthe weighting variable used to create the frequency table
  • colweightsName of the column with the weighting counts

Details

Based on the count function, it can also work with matrices or external data bases and the result may be updated.

It creates a frequency table of the data, or just of the columns specified in vars.

If you provide a freq formula, the cases are weighted by the result of the formula. Any variables in the formula are removed from the data set. If the data set is a matrix, the freq formula is a classic R formula. Otherwise, the expresion of freq is treated as a mathematical expression.

This function uses all the power of dplyr to create frequency tables. The main advantage of this function is that it works with on-disk data stored in data bases, whereas count only works with in-memory data sets.

In general, in order to use the functions of this package, the frequency table obtained by this function should fit in memory. Otherwise you must use the 'chunk' versions (link{clarachunk}, link{biglmfreq}).

The code of this function are adapted from a wish list of the devel page of dplyr (See references). Prof. Wickham also provides a nice introduction about how to use it with databases.

References

Hadley Wickham. Count function https://github.com/hadley/dplyr/issues/358 Hadley Wickham. Databases http://cran.rstudio.com/web/packages/dplyr/vignettes/databases.html

See Also

count, tbl

Examples

Run this code
tablefreq(iris)
tablefreq(iris, c("Sepal.Length","Species"))
a <- tablefreq(iris,freq="Sepal.Length")
tablefreq(a, freq="Sepal.Width")

library(dplyr)
iris %>% tablefreq("Species")

tfq <- tablefreq(iris[,c(1:2)])

chunk1 <- iris[1:10,c(1:2)]
chunk2 <- iris[c(11:20),]
chunk3 <- iris[-c(1:20),]
a <- tablefreq(chunk1)
a <- update(a,chunk2)
a <- update(a,chunk3)
a

## External databases
library(dplyr)
if(require(RSQLite)){
  hflights_sqlite <- tbl(hflights_sqlite(), "hflights")
  hflights_sqlite
  tbl_vars(hflights_sqlite)
  tablefreq(hflights_sqlite,vars=c("Year","Month"),freq="DayofMonth")
}

##
## Graphs
##
if(require(ggplot2) && require(hflights)){
  library(dplyr)

  ## One variable
  ## Bar plot
  tt <- as.data.frame(tablefreq(hflights[,"ArrDelay"]))
  p <- ggplot() + geom_bar(aes(x=x, y=freq), data=tt, stat="identity")
  print(p)

  ## Histogram
  p <- ggplot() + geom_histogram(aes(x=x, weight= freq), data = tt)
  print(p)

  ## Density
  tt <- tt[complete.cases(tt),] ## remove missing values
  tt$w <- tt$freq / sum(tt$freq) ## weights must sum 1
  p <- ggplot() + geom_density(aes(x=x, weight= w), data = tt)
  print(p)

  ##
  ## Two distributions
  ##
  ## A numeric and a factor variable
  td <- tablefreq(hflights[,c("TaxiIn","Origin")])
  td <- td[complete.cases(td),]

  ## Bar plot
  p <- ggplot() + geom_bar(aes(x=TaxiIn, weight= freq, colour = Origin),
                           data = td, position ="dodge")
  print(p)

  ## Density
  ## compute the relative frequencies for each group
  td <- td %.% group_by(Origin) %.%
               mutate( ngroup= sum(freq), wgroup= freq/ngroup)
  p <- ggplot() + geom_density(aes(x=TaxiIn, weight=wgroup, colour = Origin),
                               data = td)
  print(p)

  ## For each group, plot its values
  p <- ggplot() + geom_point(aes(x=Origin, y=TaxiIn, size=freq),
                             data = td, alpha= 0.6)
  print(p)

  ## Two numeric variables
  tc <- tablefreq(hflights[,c("TaxiIn","TaxiOut")])
  tc <- tc[complete.cases(tc),]
  p <- ggplot() + geom_point(aes(x=TaxiIn, y=TaxiOut, size=freq),
                             data = tc, color = "red", alpha=0.5)
  print(p)

  ## Two factors
  tf <- tablefreq(hflights[,c("UniqueCarrier","Origin")])
  tf <- tf[complete.cases(tf),]

  ## Bar plot
  p <- ggplot() + geom_bar(aes(x=Origin, fill=UniqueCarrier, weight= freq),
                           data = tf)
  print(p)
}

Run the code above in your browser using DataLab