
Last chance! 50% off unlimited learning
Sale ends in
tablefreq(tbl, vars = NULL, freq = NULL)# S3 method for tablefreq
update(object, ...)
tbl
. It must contain all variables in vars
and in freq
tablefreq
objecttbl
object a with label and freq columns. When it is possible, the last column is named freq
and it represents the frequency counts of the cases. This object of class tablefreq
, has two attributes:
count
function,
it can also work with matrices or external data bases and the result may be updated.It creates a frequency table of the data
, or just of the columns specified in vars
.
If you provide a freq
formula, the cases are weighted by the result of the formula. Any variables in the formula are removed from the data set. If the data set is a matrix, the freq
formula is a classic R formula. Otherwise, the expresion of freq
is treated as a mathematical expression.
This function uses all the power of dplyr
to create frequency tables. The main advantage of this function is that it works with on-disk data stored in data bases, whereas count
only works with in-memory data sets.
In general, in order to use the functions of this package, the frequency table obtained by this function should fit in memory. Otherwise you must use the 'chunk' versions (link{clarachunk}
, link{biglmfreq}
).
The code of this function are adapted from a wish list of the devel page of dplyr
(See references). Prof. Wickham also provides a nice introduction about how to use it with databases.
count
, tbl
tablefreq(iris)
tablefreq(iris, c("Sepal.Length","Species"))
a <- tablefreq(iris,freq="Sepal.Length")
tablefreq(a, freq="Sepal.Width")
library(dplyr)
iris %>% tablefreq("Species")
tfq <- tablefreq(iris[,c(1:2)])
chunk1 <- iris[1:10,c(1:2)]
chunk2 <- iris[c(11:20),]
chunk3 <- iris[-c(1:20),]
a <- tablefreq(chunk1)
a <- update(a,chunk2)
a <- update(a,chunk3)
a
## Not run: ------------------------------------
#
# ## External databases
# library(dplyr)
# if(require(RSQLite)){
# hflights_sqlite <- tbl(hflights_sqlite(), "hflights")
# hflights_sqlite
# tbl_vars(hflights_sqlite)
# tablefreq(hflights_sqlite,vars=c("Year","Month"),freq="DayofMonth")
# }
#
# ##
# ## Graphs
# ##
# if(require(ggplot2) && require(hflights)){
# library(dplyr)
#
# ## One variable
# ## Bar plot
# tt <- as.data.frame(tablefreq(hflights[,"ArrDelay"]))
# p <- ggplot() + geom_bar(aes(x=x, y=freq), data=tt, stat="identity")
# print(p)
#
# ## Histogram
# p <- ggplot() + geom_histogram(aes(x=x, weight= freq), data = tt)
# print(p)
#
# ## Density
# tt <- tt[complete.cases(tt),] ## remove missing values
# tt$w <- tt$freq / sum(tt$freq) ## weights must sum 1
# p <- ggplot() + geom_density(aes(x=x, weight= w), data = tt)
# print(p)
#
# ##
# ## Two distributions
# ##
# ## A numeric and a factor variable
# td <- tablefreq(hflights[,c("TaxiIn","Origin")])
# td <- td[complete.cases(td),]
#
# ## Bar plot
# p <- ggplot() + geom_bar(aes(x=TaxiIn, weight= freq, colour = Origin),
# data = td, position ="dodge")
# print(p)
#
# ## Density
# ## compute the relative frequencies for each group
# td <- td %>% group_by(Origin) %>%
# mutate( ngroup= sum(freq), wgroup= freq/ngroup)
# p <- ggplot() + geom_density(aes(x=TaxiIn, weight=wgroup, colour = Origin),
# data = td)
# print(p)
#
# ## For each group, plot its values
# p <- ggplot() + geom_point(aes(x=Origin, y=TaxiIn, size=freq),
# data = td, alpha= 0.6)
# print(p)
#
# ## Two numeric variables
# tc <- tablefreq(hflights[,c("TaxiIn","TaxiOut")])
# tc <- tc[complete.cases(tc),]
# p <- ggplot() + geom_point(aes(x=TaxiIn, y=TaxiOut, size=freq),
# data = tc, color = "red", alpha=0.5)
# print(p)
#
# ## Two factors
# tf <- tablefreq(hflights[,c("UniqueCarrier","Origin")])
# tf <- tf[complete.cases(tf),]
#
# ## Bar plot
# p <- ggplot() + geom_bar(aes(x=Origin, fill=UniqueCarrier, weight= freq),
# data = tf)
# print(p)
# }
## ---------------------------------------------
Run the code above in your browser using DataLab