Introduction to listcompr

library(listcompr)

The listcompr package is a light-weight collection of functions for list comprehension. It is intended as "syntactic sugar" for R and it is inspired by the list comprehension capabilities from 'python'. Next to lists, similar structures like vectors (of numeric or character type), data frames, or named lists can be easily composed. The package may be used for the simple generation of small data sets for “textbook examples”, for unit tests of your R code, or for tiny mathematical tasks.

The package is not intended for creating large data sets. The evaluation is done row-wise (and not vector-wise), which makes the evaluation relatively slow. On the other hand, it offers more flexibility to formulate "mathematical" statements. Especially for users not used to the vector-wise evaluation it should be easy to use.

Introductory examples

In this section we present the basic features of listcompr in some simple examples.

A simple example generating a vector

Assume we want to create a vector of all numbers in 1:10 which can be divided by 3 or by 4. We use the function gen.vector taking the number i as first argument. The range for i and the given condition are passed to the function in the (arbitrary many) following arguments:

gen.vector(i, i = 1:10, i %% 3 == 0 || i %% 4 == 0)

Note that it doesn't matter if we use || or | as operator. All the conditions are evaluated line by line and not vector-wise.

Additional conditions are implicitly and-connected, i.e., if we want to explicitly exclude the number 8 we can state:

gen.vector(i, i = 1:10, i %% 3 == 0 || i %% 4 == 0, i != 8)

Of course, we also could also unify that to one condition via (...) && i != 8.

Simple examples of lists and data frames with two variables

Assume we want to create a list of all tuples (i, j) where i and j are ranging in 1:3 and i >= j must hold. To this end we use the gen.list function taking the base expression c(i,j) as first argument and putting the ranges and conditions for i and j in the following arguments. This list of vectors can be expressed as:

gen.list(c(i, j), i = 1:3, j = 1:3, i <= j)

It also allowed to use the current value of i within the range of j. We can omit the condition i <= j by changing the range for j to j = i:3. Moreover we now use the similar function gen.data.frame which puts each generated vector in a row of a data frame:

df <- gen.data.frame(c(i, j), i = 1:3, j = i:3) knitr::kable(df)

The name of the columns of a the data frame are by default 'V1' and 'V2'. We can specify names by taking a named vector as base expression:

df <- gen.data.frame(c(a = i, b = j), i = 1:3, j = i:3) knitr::kable(df)

A real world example {#ex1}

Assume you have an arbitrary amount quadratic tiles and want to arrange them on your terrace under a certain condition: the number of tiles at the border should be the same as the number of tiles in the inner (i.e., all tiles without the border). For instance, a terrace with $6 \times 4$ tiles has $(6-2)\cdot(4-2) = 8$ tiles at the inner and $(6\cdot4)-8 = 14$ tiles in the inner, i.e., does not fulfill the condition. The maximum size of your terrace is $20 \times 20$ tiles (e.g., size of your garden). We get all possible sizes of a terrace with that condition via:

df <- gen.data.frame(c(x = x, y = y), x = 1:20, y = 1:20, (x-2)*(y-2) == x*y/2) knitr::kable(df)

The outer size of the terrace with that condition is either $6 \times 8$ or $5 \times 12$ tiles (and the symmetric solutions). Interestingly, it's easy to prove that these are the only solutions even if your garden and the amount of tiles is infinite.

Advanced features

Now we present some advanced features, allowing to create a bit more complex data sets.

Wildcard ranges

For any variable with the name pattern {varname}_{num} (underscore and a number), e.g. a_2, the range for all these indexed variables can be defined by {varname}_.

Assume we want to generate a data frame with all tuples $(a_1, a_2, a_3)$ with $a_1 < a_2 < a_3$ where $ai \in {1, ..., 4}$. We simple specify the range for the `a{i}variables bya_ = 1:4`:

df <- gen.data.frame(c(a_1, a_2, a_3), a_ = 1:4, a_1 < a_2, a_2 < a_3) knitr::kable(df)

Expanded expressions

Now assume we want to get all tuples $(a_1, ..., a5) \in \mathbb{N}^5$ with the condition $\sum{i} a_i = 6$. We make use of the <operator> ... <operator> notation of listcompr which expands expressions an intuitive way:

df <- gen.data.frame(c(a_1, ..., a_5), a_ = 1:5, a_1 + ... + a_5 == 6) knitr::kable(df)

The expression expansions works also for arguments of functions, i.e., we could also replace the condition by sum(a_1, ..., a_5) == 6 (cf. next example).

Generated conditions

Let's assume a similar example to the above one. We want to generate all tuples $(a_1, ..., a5) \in {1, ..., 5}^5$ with $\sum{i} a_i = 10$ and $ai \leq a{i+1}$. To avoid writing a_1 < a_2, a_2 < a_3, ... there is the function gen.logical.and which is a list comprehension helper to generate conditions. We can express this data frame by:

df <- gen.data.frame(c(a_1, ..., a_5), a_ = 1:5, sum(a_1, ..., a_5) == 10, gen.logical.and(a_i <= a_(i+1), i = 1:4)) knitr::kable(df)

Example: calculate all permutations

We can get all permutations of $(1, 2, 3, 4)$ (which are $4! = 24$) by using the following property of permutations: $\text{perm}(1, ..., n) = {(a_1, ..., a_n) | a_i \in {1, ..., n}, a_i \neq a_j \; \text{for} \; i \neq j}$. This can be expressed via (we show only the first 8 permutations):

df <- gen.data.frame(c(a_1, ..., a_4), a_ = 1:4, gen.logical.and(a_i != a_j, i = 1:4, j = (i+1):4)) knitr::kable(df[1:8,])

Note: Don't use such an approach for creating large data sets of permutations, e.g., for all the 5040 permutations of 1:7. This approach is very slow! We recommend the permn function from the combinat package.

Nested list and vector comprehensions {#ex2}

The list comprehension functions of listcompr can also be nested. In the following example we use gen.data.frame as the outer function and gen.vector (within a sum) to generate a data frame of the sum of all divisors of a whole number (the number itself excluded):

df <- gen.data.frame(c(a = a, sumdiv = sum(gen.vector(x, x = 1:(a-1), a %% x == 0))), a = 2:10) knitr::kable(df)

Very similar to this, we can generate a vector of so called perfect numbers. A perfect number is characterized by being identical to their sum of divisors. To this end we use the calculation of the sum of divisors within the condition:

gen.vector(a, a = 2:100, a == sum(gen.vector(x, x = 1:(a-1), a %% x == 0)))

This means the only perfect numbers between 1 and 200 are 6 and 28.

Character compositions

The package also offers some functions to compose characters. They can be used to generate lists or vectors of characters or (row-)names of the results.

An example for a vector of characters

Consider our example from above where we search for the number of tiles, such that the border and the inner have the same number of tiles. Assume we want a textual output instead of a data frame. We use gen.vector.char getting a character as a base expression, where all expressions in {}-brackets are evaluated according to the list comprehension result:

gen.vector.char('size: {x}x{y} tiles, where {x*y/2} tiles are at the border/inner', x = 1:20, y = 1:20, (x-2)*(y-2) == x*y/2)

A named list

Let's revisit the example with the divisors. Now we don't want the sum of the divisors, but all divisors of a number as a vector. Each vector should be stored in a list entry named divisors_of_{a}. With the following we get the divisors for all numbers in 5:10:

gen.named.list('divisors_of_{a}', gen.vector(x, x = 1:(a-1), a %% x == 0), a = 5:10)