Learn R Programming

eulerr (version 1.1.0)

euler: Area-Proportional Euler Diagrams

Description

Fit euler diagrams (a generalization of venn diagrams) using numerical optimization to find exact or approximate solutions to a specification of set relationships.

Usage

euler(combinations, ...)

# S3 method for default euler(combinations, input = c("disjoint", "union"), ...)

# S3 method for matrix euler(combinations, by = NULL, ...)

# S3 method for data.frame euler(combinations, by = NULL, ...)

Arguments

combinations

Set relationships as a named numeric vector, matrix, or data.frame. (See the methods (by class) section for details.)

Currently ignored.

input

The type of input: disjoint class combinations (disjoint) or unions (union).

by

A factor or character matrix to be used in by to split the data.frame or matrix of set combinations.

Value

A list object of class 'euler' with the following parameters.

coefficients

A matrix of x and y coordinates for the centers of the circles and their radiuses.

original.values

Set relationships provided by the user.

fitted.values

Set relationships in the solution.

residuals

Residuals.

diag_error

The largest absolute residual in percentage points between the original and fitted areas.

stress

The stress of the solution, computed as the sum of squared residuals over the total sum of squares.

Methods (by class)

  • default: A named numeric vector, with interactions seperated by an ampersand, for instance `A&B` = 10. Missing interactions are treated as being 0.

  • matrix: A matrix of logical vectors with columns representing sets and rows representing each observation's set relationships (see examples).

  • data.frame: A data.frame that can be converted to a matrix of logicals (as in the description above) via as.matrix.

Details

If by is specified, euler returns a list of diagrams separated by the categorical variables in by.

The fit minimizes the sums of squared residuals between the areas in the euler diagram and the user's initial specification (as disjoint class combinations),

$$\sum_{i=1}^{n} (y_i - \hat{y}_i) ^ 2 $$

where \(\hat{y}\) are the estimates of \(y\) that are explored during optimization.

Diagnostics are provided by the fit as the stress statistic from venneuler:

$$ \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i) ^ 2}{\sum_{i=1}^{n} y_i ^ 2} $$

where \(\hat{y}\) are OLS estimates from the regression of the fitted areas on the original areas that are currently being explored during optimization.

We also return diag_error and region_error from eulerAPE. region_error is computed as

$$ \left| \frac{y_i}{\sum y_i} - \frac{\hat{y}_i}{\sum \hat{y}_i}\right| $$

whereas diagError is the maximum of regionError.

References

Wilkinson L. Exact and Approximate Area-Proportional Circular Venn and Euler Diagrams. IEEE Transactions on Visualization and Computer Graphics [Internet]. 2012 Feb [cited 2016 Apr 9];18(2):321<U+2013>31. Available from: http://doi.org/10.1109/TVCG.2011.56

Micallef L, Rodgers P. eulerAPE: Drawing Area-Proportional 3-Venn Diagrams Using Ellipses. PLOS ONE [Internet]. 2014 Jul [cited 2016 Dec 10];9(7):e101717. Available from: http://dx.doi.org/10.1371/journal.pone.0101717

See Also

plot.euler, print.euler

Examples

Run this code
fit1 <- euler(c("A" = 1, "B" = 0.4, "C" = 3, "A&B" = 0.2))

# Same result as above
fit2 <- euler(c("A" = 1, "B" = 0.4, "C" = 3,
                "A&B" = 0.2, "A&C" = 0, "B&C" = 0,
                "A&B&C" = 0))

# Using the matrix method
mat <- cbind(A = sample(c(TRUE, TRUE, FALSE), size = 50, replace = TRUE),
             B = sample(c(TRUE, FALSE), size = 50, replace = TRUE))
fit3 <- euler(mat)

# Using grouping via the 'by' argument
dat <- data.frame(
  A = sample(c(TRUE, FALSE), size = 100, replace = TRUE),
  B = sample(c(TRUE, TRUE, FALSE), size = 100, replace = TRUE),
  gender = sample(c("Men", "Women"), size = 100, replace = TRUE),
  nation = sample(c("Sweden", "Denmark"), size = 100, replace = TRUE)
)

fit4 <- euler(dat[, 1:2], by = dat[, 3:4])

# A set with no perfect solution
euler(c("a" = 3491, "b" = 3409, "c" = 3503,
        "a&b" = 120, "a&c" = 114, "b&c" = 132,
        "a&b&c" = 50))

Run the code above in your browser using DataLab