euler: Area-Proportional Euler Diagrams

Description

Fit euler diagrams (a generalization of venn diagrams) using numerical optimization to find exact or approximate solutions to a specification of set relationships.

Usage

euler(combinations, ...)
# S3 method for default
euler(combinations, input = c("disjoint", "union"), ...)
# S3 method for data.frame
euler(combinations, weights = NULL, by = NULL, ...)
# S3 method for matrix
euler(combinations, ...)
# S3 method for table
euler(combinations, ...)
# S3 method for list
euler(combinations, ...)

Arguments

combinations

Set relationships as a named numeric vector, matrix, or data.frame. (See the methods (by class) section for details.)

...

Arguments passed down to other methods.

input

The type of input: disjoint class combinations (disjoint) or unions (union).

weights

A numeric vector of weights of the same length as by and the number of rows in combinations.

A factor or character matrix to be used in base::by() to split the data.frame or matrix of set combinations.

Value

A list object of class 'euler' with the following parameters.

coefficients

A matrix of x and y coordinates for the centers of the circles and their radii.

original.values

Set relationships provided by the user.

fitted.values

Set relationships in the solution.

residuals

Residuals.

diag_error

The largest absolute residual in percentage points between the original and fitted areas.

stress

The stress of the solution, computed as the sum of squared residuals over the total sum of squares.

Methods (by class)

default: A named numeric vector, with combinations separated by an ampersand, for instance A&B = 10. Missing combinations are treated as being 0.
data.frame: A data.frame of logicals, two-level factors (see examples).
matrix: A matrix that can be converted to a data.frame of logicals (as in the description above) via base::as.data.frame.matrix().
table: A table with max(dim(x)) < 3.
list: A list of vectors, each vector giving the contents of that set. Vectors in the list do not need to be named.

Details

If the input is a matrix or data frame and argument by is specified, the function returns a list of euler diagrams.

The function minimizes the sums of squared errors between the disjoint areas in the euler diagram and the user's input, namely

$$\sum_{i=1}^{n} (y_i - \hat{y}_i) ^ 2,$$

where $\hat{y}$ are estimates of $y$ that are currently being explored.

The stress statistic from venneuler is returned to give an indication of the goodness of the fit:

$$ \frac{ \sum_{i=1}^{n} (y_i - \hat{y}_i) ^ 2}{\sum_{i=1}^{n} y_i ^ 2}, $$

where $\hat{y}$ are ordinary least squares estimates from the regression of the fitted areas on the original areas that are currently being explored.

euler() also returns diag_error and region_error from eulerAPE. region_error is computed as

$$ \left| \frac{y_i}{\sum y_i} - \frac{\hat{y}_i}{\sum \hat{y}_i}\right|. $$

diag_error is simply the maximum of region_error.

References

Wilkinson L. Exact and Approximate Area-Proportional Circular Venn and Euler Diagrams. IEEE Transactions on Visualization and Computer Graphics (Internet). 2012 Feb (cited 2016 Apr 9);18(2):321<U+2013>31. Available from: http://doi.org/10.1109/TVCG.2011.56

Micallef L, Rodgers P. eulerAPE: Drawing Area-Proportional 3-Venn Diagrams Using Ellipses. PLOS ONE (Internet). 2014 Jul (cited 2016 Dec 10);9(7):e101717. Available from: http://dx.doi.org/10.1371/journal.pone.0101717

Examples

Run this code

# NOT RUN {
# First fit the euler specification
fit <- euler(c("A" = 1, "B" = 0.4, "C" = 3, "A&B" = 0.2))

# Then plot it
plot(fit)

# Same result as above
euler(c("A" = 1, "B" = 0.4, "C" = 3,
        "A&B" = 0.2, "A&C" = 0, "B&C" = 0,
        "A&B&C" = 0))

# A euler diagram from a list of sample spaces (the list method)
euler(list(A = c("a", "ab", "ac", "abc"),
           B = c("b", "ab", "bc", "abc"),
           C = c("c", "ac", "bc", "abc")))

# Using the matrix method
mat <- cbind(A = sample(c(TRUE, TRUE, FALSE), size = 50, replace = TRUE),
             B = sample(c(TRUE, FALSE), size = 50, replace = TRUE))
euler(mat)

# Using grouping via the 'by' argument
dat <- data.frame(
  A = sample(c(TRUE, FALSE), size = 100, replace = TRUE),
  B = sample(c(TRUE, TRUE, FALSE), size = 100, replace = TRUE),
  gender = sample(c("Men", "Women"), size = 100, replace = TRUE),
  nation = sample(c("Sweden", "Denmark"), size = 100, replace = TRUE)
)

euler(dat[, 1:2], by = dat[, 3:4])

# A set with no perfect solution
euler(c("a" = 3491, "b" = 3409, "c" = 3503,
        "a&b" = 120, "a&c" = 114, "b&c" = 132,
        "a&b&c" = 50))

# The table method
plot(euler(as.table(apply(Titanic, 2:4, sum))))
# }

Run the code above in your browser using DataLab