table.integer64: Cross Tabulation and Table Creation for integer64

Description

table.integer64 uses the cross-classifying integer64 vectors to build a contingency table of the counts at each combination of vector values.

Usage

table.integer64(...
, return = c("table","data.frame","list")
, order = c("values","counts")
, nunique = NULL
, method = NULL
, dnn = list.names(...), deparse.level = 1
)

Value

By default (with return="table") table returns a contingency table, an object of class "table", an array of integer values. Note that unlike S the result is always an array, a 1D array if one factor is given. Note also that for multidimensional arrays this is a dense return structure which can dramatically increase RAM requirements (for large arrays with high mutual information, i.e. many possible input combinations of which only few occur) and that table is limited to 2^31 possible combinations (e.g. two input vectors with 46340 unique values only). Finally note that the tabulated values or value-combinations are represented as dimnames and that the implied conversion of values to strings can cause severe performance problems since each string needs to be integrated into R's global string cache.

You can use the other return= options to cope with these problems, the potential combination limit is increased from 2^31 to 2^63 with these options, RAM is only rewquired for observed combinations and string conversion is avoided.

With return="data.frame" you get a dense representation as a data.frame (like that resulting from as.data.frame(table(...))) where only observed combinations are listed (each as a data.frame row) with the corresponding frequency counts (the latter as component named by responseName). This is the inverse of xtabs..

With return="list" you also get a dense representation as a simple list with components

values: a integer64 vector of the technically tabulated values, for 1D this is the tabulated values themselves, for kD these are the values representing the potential combinations of input values
counts: the frequency counts
dims: only for kD: a list with the vectors of the unique values of the input dimensions

Arguments

...: one or more objects which can be interpreted as factors (including character strings), or a list (or data frame) whose components can be so interpreted. (For as.table and as.data.frame, arguments passed to specific methods.)
nunique: NULL or the number of unique values of table (including NA). Providing nunique can speed-up matching when table has no cache. Note that a wrong nunique can cause undefined behaviour up to a crash.
order: By default results are created sorted by "values", or by "counts"
method: NULL for automatic method selection or a suitable low-level method, see details
return: choose the return format, see details
dnn: the names to be given to the dimensions in the result (the dimnames names).
deparse.level: controls how the default dnn is constructed. See ‘Details’.

Details

This function automatically chooses from several low-level functions considering the size of x and the availability of a cache. Suitable methods are hashmaptab (simultaneously creating and using a hashmap) , hashtab (first creating a hashmap then using it) , sortordertab (fast ordering) and ordertab (memory saving ordering).
If the argument dnn is not supplied, the internal function list.names is called to compute the ‘dimname names’. If the arguments in ... are named, those names are used. For the remaining arguments, deparse.level = 0 gives an empty name, deparse.level = 1 uses the supplied argument if it is a symbol, and deparse.level = 2 will deparse the argument.

Arguments exclude, useNA, are not supported, i.e. NAs are always tabulated, and, different from table they are sorted first if order="values".

Examples

Run this code

message("pure integer64 examples")
x <- as.integer64(sample(c(rep(NA, 9), 1:9), 32, TRUE))
y <- as.integer64(sample(c(rep(NA, 9), 1:9), 32, TRUE))
z <- sample(c(rep(NA, 9), letters), 32, TRUE)
table.integer64(x)
table.integer64(x, order="counts")
table.integer64(x, y)
table.integer64(x, y, return="data.frame")

message("via as.integer64.factor we can use 'table.integer64' also for factors")
table.integer64(x, as.integer64(as.factor(z)))

message("via as.factor.integer64 we can also use 'table' for integer64")
table(x)
table(x, exclude=NULL)
table(x, z, exclude=NULL)

# \dontshow{
 stopifnot(identical(table.integer64(as.integer64(c(1,1,2))), table(c(1,1,2))))
 stopifnot(identical(table.integer64(as.integer64(c(1,1,2)),as.integer64(c(3,4,4))), table(c(1,1,2),c(3,4,4))))
 message("the following works with three warnings due to coercion")
 stopifnot(identical(table.integer64(c(1,1,2)), table(c(1,1,2))))
 stopifnot(identical(table.integer64(as.integer64(c(1,1,2)),c(3,4,4)), table(c(1,1,2),c(3,4,4))))
 stopifnot(identical(table.integer64(c(1,1,2),as.integer64(c(3,4,4))), table(c(1,1,2),c(3,4,4))))
 message("the following works because of as.factor.integer64")
 stopifnot(identical(table(as.integer64(c(1,1,2))), table(c(1,1,2))))  
 stopifnot(identical(table(as.integer64(c(1,1,2)),as.integer64(c(3,4,4))), table(c(1,1,2),c(3,4,4))))
 stopifnot(identical(table(as.integer64(c(1,1,2)),c(3,4,4)), table(c(1,1,2),c(3,4,4))))
 stopifnot(identical(table(c(1,1,2),as.integer64(c(3,4,4))), table(c(1,1,2),c(3,4,4))))
# }

Run the code above in your browser using DataLab