OptHoldoutSize package - RDocumentation

Learn R Programming

OptHoldoutSize: an R package for estimating the optimal holdout set size for a predictive risk score to be deployed in a population.

This R package implements procedures for estimating an 'optimal holdout size' for a predictive score in order for it to be safely updated. Procedures are detailed in the manuscript 'Optimal sizing of a holdout set for safe predictive model updating' by Sami Haidar-Wehbe, Samuel R. Emerson, Louis J. M. Aslett, and James Liley.

When a predictive risk score for binary outcome $Y$ given covariates $X$ is deployed in a population, it may be used to guide interventions so as to avoid $Y$. This makes it difficult to update the predictive score safely, since $X$ can influence incidence of $Y$ in two ways: through the system being modelled, or through the predictive score itself.

A simple way to safely update a predictive is to with-hold calculation of the risk score for a proportion of the population maintained as a 'holdout' set. The predictive score can then be updated using data $X$, $Y$ from this holdout set. A question naturally arises over how large this hold-out set should be: too small, and a new predictive score cannot be trained sufficiently accurately; too large, and too many members of the population miss out on potential benefits of the risk score.

To download and install this package, use

install.packages("OptHoldoutSize")
library(OptHoldoutSize)

For examples demonstrating use of this package, see vignettes simulated_example and ASPRE_example. For a comparison of the two major algorithms implemented in this package, see vignette comparison_of_algorithms.

Copy Link

Version

Install

install.packages('OptHoldoutSize')

Monthly Downloads

215

Version

0.1.0.1

License

GPL (>= 3)

Maintainer

James Liley

Last Published

April 9th, 2025

Functions in OptHoldoutSize (0.1.0.1)

Expected improvement

Coefficients for imperfect risk score

Generate response

data_nextpoint_em

Data for 'next point' demonstration vignette on algorithm comparison using emulation algorithm

Power law function

grad_mincost_powerlaw

Gradient of minimum cost (power law)

Data for vignette on algorithm comparison

plot.optholdoutsize

Plot estimated cost function

Data for vignette on algorithm comparison

plot.optholdoutsize_emul

Plot estimated cost function using emulation (semiparametric)

data_example_simulation

Data for vignette showing general example

Covariance function for Gaussian process

Generate responses

grad_nstar_powerlaw

Gradient of optimal holdout size (power law)

Fit power law curve

error_ohs_emulation

Measure of error for emulation-based OHS emulation

sim_random_aspre

Simulate random dataset similar to ASPRE training data

optimal_holdout_size

Estimate optimal holdout size under parametric assumptions

optimal_holdout_size_emulation

Estimate optimal holdout size under semi-parametric assumptions

Make predictions

Updating function for mean.

Train model (wrapper)

Updating function for variance.

Parameters of reported ASPRE dataset

Finds best value of n to sample next

powersolve_general

General solver for power law curve

Standard error matrix for learning curve parameters (power law)

Sensitivity at theshold quantile 10%

Confidence interval for minimum total cost, when estimated using parametric method

Cost estimating function in ASPRE simulation

aspre_parametric

Parametric-based OHS estimation for ASPRE

aspre_emulation

Emulation-based OHS estimation for ASPRE

Confidence interval for optimal holdout size, when estimated using parametric method

Computes ASPRE score

Data for example on empirical confidence interval for OHS.

ci_cover_cost_a_yn

Data for example on asymptotic confidence interval for min cost.

ci_cover_cost_e_yn

Data for example on empirical confidence interval for min cost.

add_aspre_interactions

Add interaction terms corresponding to ASPRE model

Generate matrix of random observations

Data for example on asymptotic confidence interval for OHS.

data_nextpoint_par

Data for 'next point' demonstration vignette on algorithm comparison using parametric algorithm