fast_table_num: fast_table_num

Description

Faster table computation in R compared to table() by omitting as.character and as.factor

Usage

fast_table_num(x, y, edges_x, edges_y, redefine = TRUE,
byrow = FALSE, all.inside = FALSE, rightmost.closed = FALSE,
sort = FALSE, na.rm = FALSE, names = FALSE,extendOutput=FALSE)

Value

extendOutput==FALSE: [1:\(k_2\),1:\(k_1\)] numerical matrix of counts

If extendOutput==TRUE, then list of named elements

count matrix: [1:\(k_2\),1:\(k_1\)] numerical matrix of counts

x_idx: [1:\(k_1\)] numerical vector of counts for x based on edges_x

y_idx: [1:\(k_2\)] numerical vector of counts for y based on edges_y

Arguments

x: [1:n] numerical vector
y: [1:n] numerical vector
edges_x: Optional, [1:(\(k_1\)+1)] numerical vector defining the specific borders in x default unique(x) for categorical scale
edges_y: Optional, [1:(\(k_2\)+1)] numerical vector defining the specific borders in y, default unique(y) for categorical scale
redefine: Optional, boolean TRUE: resets counts in y direction in order from 1:\(k_2\) to \(k_2\):1
byrow: Optional, boolean, If FALSE (the default) the count matrix is filled by columns, otherwise the matrix is filled by rows.
all.inside: Optional, boolean, if TRUE, the returned indices are coerced into 1,...,N-1, i.e., 0 is mapped to 1 and N to N-1
rightmost.closed: Optional, boolean, if TRUE, the rightmost interval, vec[N-1] .. vec[N] is treated as closed
sort: Optional, boolean, if TRUE, edges_x, edges_y are sorted non-decreasingly, NA/NaN are gthen ignored
na.rm: Optional, boolean, if TRUE, only complete observations are taken into account
names: Optional, boolean, if TRUE, output matrix is named by edges_x[1:\(k_1\)] and edges_y[1:\(k_2\)] (left-sided)
extendOutput: Optional, boolean, default FALSE, if TRUE, list is the output, otherwise numerical matrix, see below.

Author

Michael Thrun

Details

edges_x and edges_y must be sorted non-decreasingly. Beware that kernels are centers of bins, edges_x, edges_y are borders of bins. If edges are given, edges_x, edges_y can contain Inf,-Inf borders. In that case, edges always define n-1 bins lying within the edges. data outside first edge or last edge are ignored. Edges have either to be sorted non-decreasingly or set sort=T.

If edges are not given, set sort=T. In this case, they define the unique number of points. Then the number of edges internally sets the number of bins.

Beware that in matrix notation, count matrix would be expected be ordered [1:\(k_1\),1:\(k_2\)] instead of [1:\(k_2\),1:\(k_1\)]. Here we use the ordering that intuitively is given in plot(x,y), i.e. x are columns and y are rows.

Examples

Run this code

if(requireNamespace("FCPS")){
data(Hepta,package ="FCPS")
Cls=Hepta$Cls
Cls1=Cls+1
#k unqiue points define k bins
fast_table_num(Cls,Cls1,

redefine = FALSE,names=TRUE)==as.matrix(table(Cls,Cls1))
}
#k unqiue points define k bins
tab=fast_table_num(rnorm(100),rnorm(100),redefine=FALSE,sort=TRUE)

#set k+1 edges to get k bins
x=rnorm(100)
y=rnorm(100)
binsxy=5
edgex=seq(from=min(x),to=max(x),length.out=binsxy+1)
edgesy=seq(from=min(y),to=max(y),length.out=binsxy+1)
fast_table_num(x,y,edgex,edgesy,

redefine=FALSE,names=TRUE,rightmost.closed =TRUE)

#definition of counts analog to plotting
x = c(rnorm(1000, mean=-5), rnorm(1000, mean=5))
y = rnorm(2000)
edgesx = seq(min(x), max(x), length.out=512+1)
edgesy = seq(min(y), max(y), length.out=256+1)
joint_table = fast_table_num(x, y, edgesx, edgesy)
# \donttest{
plot(x,y)
plot(colSums(joint_table),xlab="x marginal",

ylab="sum of counts",main="x-values are stored in columns")

plot(rowSums(joint_table),xlab="y marginal",

ylab="sum of counts",main="y-values are stored in rows")
# }

Run the code above in your browser using DataLab