Learn R Programming

zonohedra (version 0.3-0)

grpDuplicated: Grouping by duplicated elements

Description

grpDuplicated() is a generic function that takes an indexed set of "elements", and outputs an integer vector with the same length. The "elements" can be components of a vector, or the row vectors or column vectors of a matrix. In the output vector, a component is 0 if and only if the corresponding element is unique. When the element is unique, it forms a singleton group. Output components have equal positive integer values if and only if the corresponding elements are identical to each other. These elements form a non-singleton group, and the positive integer is called the group number.

The number of singleton groups is equal to #(zeros), which is equal to the #(elements) - #(duplicated elements).
The number of non-singleton groups is equal to max(output vector).
The number of all groups is equal to #(zeros) + max(output vector).

Usage

# S3 method for default
grpDuplicated( x, ... )
	
# S3 method for matrix
grpDuplicated( x, MARGIN=1, ... )

Value

The return value is an integer vector with all elements ranging from 0 to K, where K is the number of non-singleton groups.

For vector x the elements are the vector components, and the output is the same length as the input.

For a matrix x with MARGIN=1, the elements are the rows of the matrix and the output has length nrow(x).

For a matrix x with MARGIN=2, the elements are the columns of the matrix and the output has length ncol(x).

The 'ngroups' attribute of the returned vector is set to an integer 3-vector. The 1st component is the total number of groups, the 2nd component is the number of singleton groups, and the 3rd component is the number of non-singleton groups K.

Arguments

x

a vector or matrix of atomic mode "numeric", "integer", "logical", "complex", "character" or "raw".

MARGIN

an integer scalar, the matrix margin to be held fixed, as in apply. MARGIN=1 means that it looks for duplicated rows, and MARGIN=2 means that it looks for duplicated columns. Other values are invalid.

...

arguments for particular methods.

Author

Long Qu and Glenn Davis

Details

The implementation is based on std::unordered_map in C++11, which uses a hash-table.

Examples

Run this code
set.seed(0)

#   test a numeric vector
x = rnorm(7)
y = rnorm(5)
grpDuplicated( c(x,y,rev(x)) )
##  [1] 7 6 5 4 3 2 1 0 0 0 0 0 1 2 3 4 5 6 7
##  attr(,"ngroups")
##  [1] 12  5  7

# test a numeric matrix, both rows and columns
A = matrix( rnorm(3*7), 3, 7 )
B = matrix( rnorm(3*5), 3, 5 )

#   the columns of cbind(A,B,A) have the duplicates one would expect
grpDuplicated( cbind(A,B,A), MARGIN=2 )
##  [1] 1 2 3 4 5 6 7 0 0 0 0 0 1 2 3 4 5 6 7
##  attr(,"ngroups")
##  [1] 12  5  7

# but the rows of cbind(A,B,A) are unique
grpDuplicated( cbind(A,B,A), MARGIN=1 )
##  [1] 0 0 0
##  attr(,"ngroups")
##  [1] 3 3 0

Run the code above in your browser using DataLab