metamer (version 0.2.0)

metamerize: Create metamers

Description

Produces very dissimilar datasets with the same statistical properties.

Usage

metamerize(data, preserve, minimize = NULL, change = colnames(data),
  signif = 2, N = 100, trim = N, annealing = TRUE,
  perturbation = 0.08, name = NULL, verbose = interactive())

Arguments

data

A data.frame with the starting data or a metamer_list object returned by a previous call to the function.

preserve

A function whose result must be kept exactly the same. Must take the data as argument and return a numeric vector.

minimize

An optional function to minimize in the process. Must take the data as argument and return a single numeric.

change

A character vector with the names of the columns that need to be changed.

signif

The number of significant digits of preserve that need to be preserved.

N

Number of iterations.

trim

Max number of metamers to return.

annealing

Logical indicating whether to perform annealing.

perturbation

Numeric with the magnitude of the random perturbations. Can be of length 1 or length(change).

name

Character for naming the metamers.

verbose

Logical indicating whether to show a progress bar.

Value

A metamer_list object (a list of data.frames).

Details

It follows Matejka & Fitzmaurice (2017) method of constructing metamers. Beginning from a starting dataset, it iteratively adds a small perturbation, checks if preserve returns the same value (up to signif significant digits) and if minimize has been lowered, and accepts the solution for the next round. If annealing is TRUE, it also accepts solutions with bigger minimize with an ever decreasing probability to help the algorithm avoid local minimums.

If data is a metamer_list, the function will start the algorithm from the last metamer of the list. Furthermore, if preserve and/or minimize are missing, the previous functions will be carried over from the previous call.

minimize can be also a vector of functions. In that case, the process minimizes the product of the functions applied to the data.

References

Matejka, J., & Fitzmaurice, G. (2017). Same Stats, Different Graphs. Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems - CHI <U+2019>17, 1290<U+2013>1294. https://doi.org/10.1145/3025453.3025912

See Also

delayed_with() for a convenient way of making functions suitable for preserve, mean_dist_to() for a convenient way of minimizing the distance to a known target in minimize, mean_self_proximity() for maximizing the "self distance" to prevent data clumping.

Examples

Run this code
# NOT RUN {
data(cars)
# Metamers of `cars` with the same mean speed and dist, and correlation
# between the two.
means_and_cor <- delayed_with(mean_speed = mean(speed),
                              mean_dist = mean(dist),
                              cor = cor(speed, dist))
set.seed(42)  # for reproducibility.
metamers <- metamerize(cars,
                       preserve = means_and_cor,
                       signif = 3,
                       N = 1000)
print(metamers)

last <- metamers[[length(metamers)]]

# Confirm that the statistics are the same
cbind(original = means_and_cor(cars),
      metamer = means_and_cor(last))

# Visualize
plot(metamers[[length(metamers)]])
points(cars, col = "red")

# }

Run the code above in your browser using DataLab