Learn R Programming

SeBR: Semiparametric Bayesian Regression

Overview. Data transformations are a useful companion for parametric regression models. A well-chosen or learned transformation can greatly enhance the applicability of a given model, especially for data with irregular marginal features (e.g., multimodality, skewness) or various data domains (e.g., real-valued, positive, or compactly-supported data).

Given paired data $(x_i,y_i)$ for $i=1,\ldots,n$, SeBR implements efficient and fully Bayesian inference for semiparametric regression models that incorporate (1) an unknown data transformation

$$ g(y_i) = z_i $$

and (2) a useful parametric regression model

$$ z_i = f_\theta(x_i) + \sigma \epsilon_i $$

with unknown parameters $\theta$ and independent errors $\epsilon_i$.

Examples. We focus on the following important special cases:

  1. The linear model is a natural starting point:

$$ z_i = x_i'\theta + \sigma\epsilon_i, \quad \epsilon_i \stackrel{iid}{\sim} N(0, 1) $$

The transformation $g$ broadens the applicability of this useful class of models, including for positive or compactly-supported data.

  1. The quantile regression model replaces the Gaussian assumption in the linear model with an asymmetric Laplace distribution (ALD)

$$ z_i = x_i'\theta + \sigma\epsilon_i, \quad \epsilon_i \stackrel{iid}{\sim} ALD(\tau) $$

to target the $\tau$th quantile of $z$ at $x$, or equivalently, the $g^{-1}(\tau)$th quantile of $y$ at $x$. The ALD is quite often a very poor model for real data, especially when $\tau$ is near zero or one. The transformation $g$ offers a pathway to significantly improve the model adequacy, while still targeting the desired quantile of the data.

  1. The Gaussian process (GP) model generalizes the linear model to include a nonparametric regression function,

$$ z_i = f_\theta(x_i) + \sigma \epsilon_i, \quad \epsilon_i \stackrel{iid}{\sim} N(0, 1) $$

where $f_\theta$ is a GP and $\theta$ parameterizes the mean and covariance functions. Although GPs offer substantial flexibility for the regression function $f_\theta$, this model may be inadequate when $y$ has irregular marginal features or a restricted domain (e.g., positive or compact).

Challenges: The goal is to provide fully Bayesian posterior inference for the unknowns $(g, \theta)$ and posterior predictive inference for future/unobserved data $\tilde y(x)$. We prefer a model and algorithm that offer both (i) flexible modeling of $g$ and (ii) efficient posterior and predictive computations.

Innovations: Our approach (https://doi.org/10.1080/01621459.2024.2395586) specifies a nonparametric model for $g$, yet also provides Monte Carlo (not MCMC) sampling for the posterior and predictive distributions. As a result, we control the approximation accuracy via the number of simulations, but do not require the lengthy runs, burn-in periods, convergence diagnostics, or inefficiency factors that accompany MCMC. The Monte Carlo sampling is typically quite fast.

Using SeBR

The package SeBR is installed and loaded as follows:

# CRAN version:
# install.packages("SeBR")

# Development version: 
# devtools::install_github("drkowal/SeBR")
library(SeBR) 

The main functions in SeBR are:

  • sblm(): Monte Carlo sampling for posterior and predictive inference with the semiparametric Bayesian linear model;

  • sbsm(): Monte Carlo sampling for posterior and predictive inference with the semiparametric Bayesian spline model, which replaces the linear model with a spline for nonlinear modeling of $x \in \mathbb{R}$;

  • sbqr(): blocked Gibbs sampling for posterior and predictive inference with the semiparametric Bayesian quantile regression; and

  • sbgp(): Monte Carlo sampling for predictive inference with the semiparametric Bayesian Gaussian process model.

Each function returns a point estimate of $\theta$ (coefficients), point predictions at some specified testing points (fitted.values), posterior samples of the transformation $g$ (post_g), and posterior predictive samples of $\tilde y(x)$ at the testing points (post_ypred), as well as other function-specific quantities (e.g., posterior draws of $\theta$, post_theta). The calls coef() and fitted() extract the point estimates and point predictions, respectively.

Note: The package also includes Box-Cox variants of these functions, i.e., restricting $g$ to the (signed) Box-Cox parametric family $g(t; \lambda) = {\mbox{sign}(t) \vert t \vert^\lambda - 1}/\lambda$ with known or unknown $\lambda$. The parametric transformation is less flexible, especially for irregular marginals or restricted domains, and requires MCMC sampling. These functions (e.g., blm_bc(), etc.) are primarily for benchmarking.

Detailed documentation and examples are available at https://drkowal.github.io/SeBR/.

References

Kowal, D. and Wu, B. (2024). Monte Carlo inference for semiparametric Bayesian regression. JASA. https://doi.org/10.1080/01621459.2024.2395586

Copy Link

Version

Install

install.packages('SeBR')

Monthly Downloads

146

Version

1.1.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Dan Kowal

Last Published

June 16th, 2025

Functions in SeBR (1.1.0)

bgp_bc

Bayesian Gaussian processes with a Box-Cox transformation
concen_hbb

Posterior sampling algorithm for the HBB concentration hyperparameters
rank_approx

Rank-based estimation of the linear regression coefficients
g_fun

Compute the transformation
sampleFastGaussian

Sample a Gaussian vector using Bhattacharya et al. (2016)
g_bc

Box-Cox transformation
contract_grid

Grid contraction
plot_pptest

Plot point and interval predictions on testing data
hbb

Hierarchical Bayesian bootstrap posterior sampler
g_inv_approx

Approximate inverse transformation
g_inv_bc

Inverse Box-Cox transformation
sbsm

Semiparametric Bayesian spline model
square_stabilize

Numerically stabilize the squared elements
sir_adjust

Post-processing with importance sampling
simulate_tlm

Simulate a transformed linear model
sbgp

Semiparametric Bayesian Gaussian processes
sblm_ssvs

Semiparametric Bayesian linear model with stochastic search variable selection
sbqr

Semiparametric Bayesian quantile regression
uni.slice

Univariate Slice Sampler from Neal (2008)
sblm_modelsel

Model selection for semiparametric Bayesian linear regression
sblm_hs

Semiparametric Bayesian linear model with horseshoe priors for high-dimensional data
sblm

Semiparametric Bayesian linear model
all_subsets

Compute all subsets of a set
bsm_bc

Bayesian spline model with a Box-Cox transformation
computeTimeRemaining

Estimate the remaining time in the algorithm
bqr

Bayesian quantile regression
Fz_fun

Compute the latent data CDF
bb

Bayesian bootstrap posterior sampler for the CDF
blm_bc

Bayesian linear model with a Box-Cox transformation
SSR_gprior

Compute the sum-squared-residuals term under Zellner's g-prior
blm_bc_hs

Bayesian linear model with a Box-Cox transformation and a horseshoe prior