Learn R Programming

⚠️There's a newer version (0.2.3) of this package.Take me there.

WpProj: Linear p-Wasserstein Projections

The goal of WpProj is to perform Wasserstein projections from the predictive distributions of any model into the space of predictive distributions of linear models. This package employs the methods as described in Eric Dunipace and Lorenzo Trippa (2020). <arXiv:2012.09999>.

The Wasserstein distance is a measure of distance between two probability distributions. It is defined as:
$$W_p(\mu,\nu) = \left(\inf_{\pi \in \Pi(\mu,\nu)} \int_{\mathbb{R}^d \times \mathbb{R}^d} |x-y|^p d\pi(x,y)\right)^{1/p}$$
where $\Pi(\mu,\nu)$ is the set of all joint distributions with marginals $\mu$ and $\nu$.

In the our package, if $\mu$ is the original prediction from the original model, such as from a Bayesian linear regression or a neural network, then we seek to find a new prediction $\nu$ that minimizes the Wasserstein distance between the two:
$$\mathop{\text{argmin}} _ {\nu} W _ {p}(\mu,\nu) ^ {p},$$
subject to the constraint that $\nu$ is a linear model.

To reduce the complexity of the number of parameters, we add an L1 penalty to the coefficients of the linear model to reduce the complexity of the model space:
$$\mathop{\text{argmin}} _ {\nu} W _ {p}(\mu,\nu) ^ {p} + P_{\lambda}(\nu),$$
where $P_\lambda(\nu)$ is a penalty on the complexity of the model space such as the the $L_1$ penalty on the coefficients of the linear model.

Installation

You can install the development version of WpProj from GitHub with:

# install.packages("devtools")
devtools::install_github("ericdunipace/WpProj")

Example

This is a basic example running the WpProj function on a simulated dataset. Note we create a pseudo posterior from a simple dataset for illustration purposes:

library(WpProj)
set.seed(23048)
# note we don't generate believable data with real posteriors
# these examples are just to show how to use the function
n <- 32
p <- 10
s <- 21

# covariates and coefficients
x <- matrix( stats::rnorm( p * n ), nrow = n, ncol = p )
beta <- (1:10)/10

#outcome
y <- x %*% beta + stats::rnorm(n)

# fake posterior
post_beta <- matrix(beta, nrow=p, ncol=s) + stats::rnorm(p*s, 0, 0.1)
post_mu <- x %*% post_beta #posterior predictive distributions

# fit models
## L1 model
fit.p2     <-  WpProj(X=x, eta=post_mu, power = 2.0,
                   method = "L1", #default
                   solver = "lasso" #default
)

## approximate binary program
fit.p2.bp <-  WpProj(X=x, eta=post_mu, theta = post_beta, power = 2.0,
                   method = "binary program",
                   solver = "lasso" #default because approximate algorithm is faster
)

We can compare the performance of the models using the distCompare function (measuring distance between the reduced models and the original model) and then generate a plot

dc <- distCompare(models = list("L1" = fit.p2, "Binary Program" = fit.p2.bp),
                  target = list(parameters = post_beta,
                                  predictions = post_mu))
p <- plot(dc, ylabs = c("2-Wasserstein Distance", "2-Wasserstein Distance"))
p$parameters + ggplot2::ggtitle("Parameters")
p$predictions + ggplot2::ggtitle("Predictions")

We can also compare performacne by measure the relative distance between a null model and the predictions of interest as a pseudo $R^2$

r2.null  <- WPR2(projected_model = dc) # should be between 0 and 1
plot(r2.null)

We can also examine how the predictions change in the models as more covariates are added for individual observations.

ridgePlot(fit.p2, index = 21, minCoef = 0, maxCoef = 10)

References

Eric Dunipace and Lorenzo Trippa (2020).

Copy Link

Version

Install

install.packages('WpProj')

Monthly Downloads

138

Version

0.2.1

License

GPL (== 3.0)

Issues

Pull Requests

Stars

Forks

Maintainer

Eric Dunipace

Last Published

February 2nd, 2024

Functions in WpProj (0.2.1)

combine.WPR2

A Function to Combine \(W_p R ^2\) Objects
binary_program_method_options

Options For Use With the Binary Program Method
distCompare

Compares Optimal Transport Distances Between WpProj and Original Models
WPR2

\(W_p R^2\) Function to Evaluate Performance
WPSW

p-Wasserstein Distance Linear Projections Using a Stepwise Method
combine.distcompare

Combine distance calculations from the distCompare function
WpProj-package

WpProj: Linear p-Wasserstein Projections
WpProj

p-Wasserstein Linear Projections
WPVI

p-Wasserstein Variable Importance
stepwise_method_options

Options For Use With the Stepwise Selection Method
plot,WPR2-method

Plot Function for \(W_p R^2\) Objects
ridgePlot

Ridge Plots for a Range of Coefficients
plot,combine_distcompare-method

Plot 'combine.distcompare' Objects
simulated_annealing_method_options

Options For Use With the Simulated Annealing Selection Method
plot,distcompare-method

Plot distcompare Objects
WPSA

p-Wasserstein distance projections using simulated annealing
rank_distcompare

Ranks distcompare Objects
transport_options

Available Wasserstein Distance Methods
wasserstein

Calculate Wasserstein distances
plot_ranks

Plot the Rankings of the 'combine.distcompare' Objects
W2L1

2-Wasserstein distance linear projections with an \(L_1\) penalty
WInfL1

Infinity-Wasserstein Linear Projections With an L1 Penalty
L1_penalty_options

Recognized L1 Penalties
L1_method_options

Options For Use With the L1 Method
WPL1

p-Wasserstein Linear Projections With an \(L_1\) Penalty
WPL0

p-Wasserstein projections with an L0 penalty
HC

Run the Hahn-Carvalho Method
L0_method_options

Options For Use With the L0 Method
W2IP

2-Wasserstein distance selection by Integer Programming
W1L1

1-Wasserstein projection