savingby2d: Assess advantage of 2-D view over 1-D view for identifying extrapolation

Description

A simple algorithm to evaluate the advantage of by taking a bivariate marginal view of two variables, when trying to avoid extrapolations, rather than two univariate marginal views.

Usage

savingby2d(x, y = NULL, method = "default")

Arguments

A numeric or factor vector. Can also be a dataframe containing x and y, if y is NULL.

A numeric or factor vector.

method

Character; criterion used to quantify bivariate relationships. Can be "default", a scagnostic measure, or "DECR" to use a density estimate confidence region.

Value

A number between 0 and 1. Values near 1 imply no benefit to using a 2-D view, whereas values near 0 imply that a 2-D view reveals structure hidden in the 1-D views.

Details

If given two continuous variables, the variables are both scaled to mean 0 and variance 1. Then the returned value is the ratio of the area of the convex hull of the data to the area obtained from the product of the ranges of the two areas, i.e. the area of the bounding rectangle.

If given two categorical variables, all combinations are tabulated. The returned value is the number of non-zero table entries divided by the total number of table entries.

If given one categorical and one continuous variable, the returned value is the weighted mean of the range of the continuous variable within each category divided by the overall range of the continuous variable, where the weights are given by the number of observations in each level of the categorical variable.

Requires package scagnostics if a scagnostics measure is specified in method. Requires package hdrcde if "DECR" (density estimate confidence region) is specified in method. These only apply to cases where x and y are both numeric.

References

O'Connell M, Hurley CB and Domijan K (2017). ``Conditional Visualization for Statistical Models: An Introduction to the condvis Package in R.''Journal of Statistical Software, 81(5), pp. 1-20. <URL:http://dx.doi.org/10.18637/jss.v081.i05>.

Examples

Run this code

# NOT RUN {
x <- runif(1000)
y <- runif(1000)
plot(x, y)
savingby2d(x, y)
## value near 1, no real benefit from bivariate view

x1 <- runif(1000)
y1 <- x1 + rnorm(sd = 0.3, n = 1000)
plot(x1, y1)
savingby2d(x1, y1)
## smaller value indicates that the bivariate view reveals some structure

# }

Run the code above in your browser using DataLab