count: Count the number of occurences of every unique element in a data object

Description

The function count() does roughly the same as the function table(). However, the differences are that (a) count() may make the result more directly accessible, because it returns the result in the form of a data.frame; and (b) that it is polymorphic, that is, three levels of complexity can be specified. Perhaps count() may align with a certain psychological expectation that when base R provides functions like mean(some_numerical_vector) and sum(some_numerical_vector), then count(some_numerical_vector) would seem to complement these. However, functions like mean() and sum() only work on numerical vectors, and return one numerical value only (and hence can be used more easily as sub-functions in e.g. apply()), whereas count can deal with various data types and returns a data.frame.

Usage

count(x, group.by = NULL, split.by = NULL, row.total = FALSE)

Arguments

The object to count elements from. Can be a vector, a matrix or a data.frame.

group.by

An optional vector of column names in x. It denotes groups, sub-groups, sub-sub-groups (etc., depending on the number of columns specified) by which counts need to be grouped. See examples.

split.by

An optional column name in x, by which to split the counts 'horizontally'. That is, whereas 'group.by' is returned as rows, 'split.by' is returned as columns, whereby every value in 'split.by' will become a column. In that sense, it acts as a pivot specifier.

row.total

Boolean, specifying whether row totals need to be included as the right-most column. This is only relevant IFF split.by is provided. The column will be called 'row.total'; if that column name already exists in x, then 'row.total' will be trailed by underscores until a unique column name is generated.

Value

A data.frame. If 'x' is a vector, the data.frame has as the first column 'element' which are the elements in x (the vector) that have been counted, and a second column 'count' which represent the counts; if 'x' is a data.frame, then the first column(s) is/are the 'group.by' column(s), and a colum 'count' contains the actual counts of the number of rows for the unique number of rows for 'group.by'. If 'split.by' is also specified, then the return data.frame consists of the columns specified by 'group.by' and the unique values in 'split.by'. See examples.

Examples

Run this code

# NOT RUN {
my_num_vector = c(1,1,1,4,5,5)
mean(my_num_vector)
sum(my_num_vector)
table(my_num_vector)
count(my_num_vector)

my_str_vector = c('R', 'R', 'R', 'S', 'T', 'T')
table(my_str_vector)
count(my_str_vector)

my_DF <- data.frame(var1=rep(c('A','B','C'), 2), var2=c(1,1, 2,2, 3,3),
var3=rep(c('bbb','aaa','bbb'), 2))
count(my_DF, c('var1', 'var2'))
count(my_DF, c('var1', 'var3'))
count(my_DF, c('var2', 'var3'))
count(my_DF, 'var3')
#and compare with:
count(my_DF$var3)

my_DF = data.frame(var1=factor(c(rep('low', 4),rep('medium', 4),rep('high', 4)),
levels=c('low', 'medium', 'high')), var2=c(1, 2,2, 3,3,3, 4,4,4,4, 3, 2),
var3=rep(c('bbb','aaa','bbb'), 4), stringsAsFactors=FALSE)
count(my_DF, c('var3', 'var2'), 'var1')
# The counts are grouped by unique combinations of 'var3' and 'var2', ...
# ...and split out by the unique content of 'var1'.
# Note that if levels are given (as in this case), then the columns for 'split.by'...
# ...are ordered according to the sequence of the levels; otherwise in alphanumerical order.

# Also non-factors can be used for 'split.by':
count(my_DF, c('var1', 'var3'), 'var2')

# For the 'group.by' variable, NAs are treated as 'factor'.
# When there are NAs in the 'split.by' column, then an extra NA column is returned, ...
# ...specifying the counts of the NAs:
my_DF_w_NA = my_DF # same as above, but now...
my_DF_w_NA$var1[1] <- NA
my_DF_w_NA$var2[c(6,10)] <- NA
my_DF_w_NA$var3[10] <- NA
count(my_DF_w_NA, c('var1', 'var3'), 'var2')

# To show the idea of row totals:
count(my_DF_w_NA, c('var2', 'var3'), 'var1', row.total=TRUE)
# }

Run the code above in your browser using DataLab