Learn R Programming

RcppAlgos (version 0.2.5)

comboGeneral: Generate all Combinations/Permutations of a Vector with/without Constraints

Description

Quickly generate all combinations or permutations of a vector, chosen \(m\) at a time, with or without constraints using Rcpp.

Usage

comboGeneral(v, m = NULL, repetition = FALSE, constraintFun = NULL,
                comparisonFun = NULL, limitConstraints = NULL,
                rowCap = NULL, keepResults = FALSE, freqs = NULL)

permuteGeneral(v, m = NULL, repetition = FALSE, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, rowCap = NULL, keepResults = FALSE, freqs = NULL)

Arguments

v

Source vector. If v is an integer, it will be converted to the sequence 1:v.

m

Number of elements to choose. If repetition = TRUE, m can exceed the length of v. The default is NULL. This is to allow for combinations/permutations of multisets utilizing the freqs argument (See below).

repetition

Logical value indicating whether combinations should be with or without repetition. The default is FALSE.

constraintFun

Function to be applied to the elements of v that should be passed as a string (E.g. constraintFun = "sum"). The possible constraint functions are: "sum", "prod", "mean", "max", & "min". The default is NULL, meaning no function is applied.

comparisonFun

Comparison operator that will be used to compare limitConstraints with the result of constraintFun applied to v. It should be passed as a string (E.g. comparisonFun = "<="). The possible comparison operators are: "<", ">", "<=", ">=", "==". The default is NULL.

limitConstraints

This is the value that will be used for comparison. The default is NULL.

rowCap

The maximal number of expected results when a constraint is applied. Can also be used if you only need a specific number of combinations/permutations. This is useful when the total number of combinations/permutations without constraint is large and you expect/need a small number of combinations/permutations. Using rowCap can drastically improve run time and avoid unnecessary crashes due to lack of memory. See examples below.

keepResults

A logical flag indicating if the result of constraintFun applied to v should be displayed; if TRUE, an additional column of results will be added to the resulting matrix. The default is FALSE.

freqs

A vector of frequencies used for producing all permutations/combinations of a multiset of v. Each element of freqs represents how many times each element of the source vector, v, is repeated. It is analogous to the times argument in rep. The default value is NULL.

Value

In general, a matrix is returned with each row containing a vector of length \(m\) or \(m + 1\) depending on the value of keepResults. If m isn't supplied and freqs is given, a matrix is returned with each row containing a vector of length sum(freqs).

Details

Finding all combinations/permutations with constraints is optimized by organizing them in such a way that when constraintFun is applied, a monotonic sequence is produced. Combinations/permutations are added successively, until a particular combination exceeds the given constraint value for a given constraint/comparison function combo. After this point, we can safely skip several combinations knowing that they will exceed the given constraint value.

When there are any negative values in v and constraintFun = "prod", producing a monotonic set is non-trivial for the general case. As a result, performance will suffer as all combinations/permutations must be tested against the constraint criteria. Additionally, rowCap will not have its normal effectiveness (i.e. it will only limit the number of rows after producing all combinations/permutations).

References

Examples

Run this code
# NOT RUN {
system.time(comboGeneral(17,8))
system.time(permuteGeneral(13,6))

system.time(comboGeneral(13,10,repetition = TRUE))
system.time(permuteGeneral(factor(letters[1:9]),6,TRUE))

## permutations of the multiset (with or w/o setting m) :
## c(1,1,1,1,2,2,2,3,3,4,4,4,4,4)
system.time(permuteGeneral(4, freqs = c(4,3,2,5)))

permuteGeneral(4, m = 2, freqs = c(4,3,2,5))

## or combinations
comboGeneral(4, m = 2, freqs = c(4,3,2,5))

## Generate some random data
set.seed(1009)
mySamp <- rnorm(75, 997, 23)

## How to use rowCap example:
## Researcher only needs 1000 7-tuples of mySamp
## such that the sum is greater than 7200.
system.time(comboGeneral(mySamp,7,FALSE,"sum",">",7200,1000))

## If you leave rowCap NULL in examples like the above,
## it can take much longer when the total number of
## possible combinations is large (still fast enough
## most of the time). In some cases, it can crash your
## computer as the underlying code allocates enough
## space to account for every combination (e.g. In our
## example, there are choose(75, 7) = 1984829850 rows,
## 7 columns, with each cell occupying 8 bytes. This
## gives a total over 100 GB (i.e. choose(75,7)*7*8/(2^30))

## You can use rowCap with any combinatorial function
## regardless of the existence of constraints. E.g.
comboGeneral(1000, 20, rowCap = 10)

## class of the source vector is preserved
class(comboGeneral(5,3)[1,]) == class(1:5)
class(comboGeneral(c(1,2:5),3)[1,]) == class(c(1,2:5))

# }
# NOT RUN {
## Using keepResults will add a column of results
permuteGeneral(-3,6,TRUE,"prod",keepResults = TRUE)
comboGeneral(-3,6,TRUE,"sum","==",-8,keepResults = TRUE)

## Find 13-tuple combinations of 1:25 such
## that the mean is less than 10
system.time(myComb <- comboGeneral(25,13,FALSE,"mean","<",10))

## Alternatively, you must generate all combinations and subsequently
## subset to obtain the combinations that meet the criteria
system.time(myComb2 <- combn(25,13))
system.time(myCols <- which(colMeans(myComb2) < 10))
system.time(myComb2 <- myComb2[,myCols])

## Any variation is much slower
system.time(myComb2 <- combn(25,13)[,combn(25,13,mean) < 10])

## Test equality with myComb above
all.equal(myComb,t(myComb2))

## Fun example... see stackoverflow:
## https://stackoverflow.com/q/22218640/4408538
system.time(permuteGeneral(seq(0L,100L,10L),8,TRUE,"sum","==",100,10^5))
# }

Run the code above in your browser using DataLab