Learn R Programming

⚠️There's a newer version (0.9.8.4) of this package.Take me there.

Contextual: Multi-Armed Bandits in R

Overview

R package facilitating the simulation and evaluation of context-free and contextual Multi-Armed Bandit policies.

The package has been developed to:

  • Ease the implementation, evaluation and dissemination of both existing and new contextual Multi-Armed Bandit policies.
  • Introduce a wider audience to contextual bandit policies' advanced sequential decision strategies.

Package links:

Installation

To install contextual from CRAN:

install.packages('contextual')

To install the development version (requires the devtools package):

install.packages("devtools")
devtools::install_github('Nth-iteration-labs/contextual')

When working on or extending the package, clone its GitHub repository, then do:

install.packages("devtools")
devtools::install_deps(dependencies = TRUE)
devtools::build()
devtools::reload()

clean and rebuild...

Overview of core classes

Contextual consists of six core classes. Of these, the Bandit and Policy classes are subclassed and extended when implementing custom (synthetic or offline) bandits and policies. The other four classes (Agent, Simulator, History, and Plot) are the workhorses of the package, and generally need not be adapted or subclassed.

Documentation

See the demo directory for practical examples and replications of both synthetic and offline (contextual) bandit policy evaluations.

When seeking to extend contextual, it may also be of use to review "Extending Contextual: Frequently Asked Questions", before diving into the source code.

How to replicate figures from two introductory context-free Multi-Armed Bandits texts:

Basic, context-free multi-armed bandit examples:

Examples of both synthetic and offline contextual multi-armed bandit evaluations:

An example how to make use of the optional theta log to create interactive context-free bandit animations:

Some more extensive vignettes to get you started with the package:

Paper offering a general overview of the package's structure & API:

Policies and Bandits

Overview of contextual's growing library of contextual and context-free bandit policies:

GeneralContext-freeContextualOther
Random Oracle Fixed Epsilon-Greedy Epsilon-First UCB1, UCB2 Thompson Sampling BootstrapTS Softmax Gradient GittinsCMAB Naive Epsilon-Greedy Epoch-Greedy LinUCB (General, Disjoint, Hybrid)Linear Thompson Sampling ProbitTS LogitBTSGLMUCB Lock-in Feedback (LiF)

Overview of contextual's bandit library:

Basic SyntheticContextual SyntheticOfflineContinuous
Basic Bernoulli Bandit Basic Gaussian Bandit Contextual Bernoulli Contextual Logit Contextual Hybrid Contextual Linear Contextual WheelReplay Evaluator Bootstrap ReplayPropensity WeightingDirect MethodDoubly RobustContinuum

Alternative parallel backends

By default, "contextual" uses R's built-in parallel package to facilitate parallel evaluation of multiple agents over repeated simulation. See the demo/alternative_parallel_backends directory for several alternative parallel backends:

Maintainers

Robin van Emden: author, maintainer* Maurits Kaptein: supervisor*

* Tilburg University / Jheronimus Academy of Data Science.

If you encounter a clear bug, please file a minimal reproducible example on GitHub.

Copy Link

Version

Install

install.packages('contextual')

Monthly Downloads

14

Version

0.9.8.3

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Robin van Emden

Last Published

March 4th, 2020

Functions in contextual (0.9.8.3)

BasicBernoulliBandit

Bandit: BasicBernoulliBandit
ContextualEpsilonGreedyPolicy

Policy: ContextualEpsilonGreedyPolicy with unique linear models
ContextualBernoulliBandit

Bandit: Naive Contextual Bernouilli Bandit
ContextualEpochGreedyPolicy

Policy: A Time and Space Efficient Algorithm for Contextual Linear Bandits
ContextualHybridBandit

Bandit: ContextualHybridBandit
Bandit

Bandit: Superclass
Agent

Agent
BootstrapTSPolicy

Policy: Thompson sampling with the online bootstrap
ContextualBinaryBandit

Bandit: ContextualBinaryBandit
BasicGaussianBandit

Bandit: BasicGaussianBandit
ContextualLinTSPolicy

Policy: Linear Thompson Sampling with unique linear models
ContextualLogitBandit

Bandit: ContextualLogitBandit
ContextualLogitBTSPolicy

Policy: ContextualLogitBTSPolicy
ContextualLinearBandit

Bandit: ContextualLinearBandit
ContextualTSProbitPolicy

Policy: ContextualTSProbitPolicy
LinUCBHybridOptimizedPolicy

Policy: LinUCB with hybrid linear models
LinUCBGeneralPolicy

Policy: LinUCB with unique linear models
ContextualPrecachingBandit

Bandit: ContextualPrecachingBandit
EpsilonGreedyPolicy

Policy: Epsilon Greedy
EpsilonFirstPolicy

Policy: Epsilon First
Exp3Policy

Policy: Exp3
LinUCBDisjointOptimizedPolicy

Policy: LinUCB with unique linear models
LinUCBDisjointPolicy

Policy: LinUCB with unique linear models
FixedPolicy

Policy: Fixed Arm
History

History
ContextualWheelBandit

Bandit: ContextualWheelBandit
GradientPolicy

Policy: Gradient
ContinuumBandit

Bandit: ContinuumBandit
Plot

Plot
GittinsBrezziLaiPolicy

Policy: Gittins Approximation algorithm for choosing arms in a MAB problem.
OfflineDoublyRobustBandit

Bandit: Offline Doubly Robust
SoftmaxPolicy

Policy: Softmax
OfflineDirectMethodBandit

Bandit: Offline Direct Methods
LifPolicy

Policy: Continuum Bandit Policy with Lock-in Feedback
Policy

Policy: Superclass
OfflineReplayEvaluatorBandit

Bandit: Offline Replay
OfflineLookupReplayEvaluatorBandit

Bandit: Offline Replay with lookup tables
OraclePolicy

Policy: Oracle
UCB2Policy

Policy: UCB2
UCB1Policy

Policy: UCB1
LinUCBHybridPolicy

Policy: LinUCB with hybrid linear models
invgamma

The Inverse Gamma Distribution
ThompsonSamplingPolicy

Policy: Thompson Sampling
OfflineBootstrappedReplayBandit

Bandit: Offline Bootstrapped Replay
OfflinePropensityWeightingBandit

Bandit: Offline Propensity Weighted Replay
invlogit

Inverse Logit Function
clipr

Clip vectors
one_hot

One Hot Encoding of data.table columns
formatted_difftime

Format difftime objects
get_arm_context

Return context vector of an arm
ind

On-the-fly indicator function for use in formulae
dec<-

Decrement
sample_one_of

Sample one element from vector or list
inv

Inverse from Choleski (or QR) Decomposition.
which_max_list

Get maximum value in list
prob_winner

Binomial Win Probability
which_max_tied

Get maximum value randomly breaking ties
RandomPolicy

Policy: Random
ones_in_zeroes

A vector of zeroes and ones
get_full_context

Get full context matrix over all arms
sim_post

Binomial Posterior Simulator
sherman_morrisson

Sherman-Morrisson inverse
Simulator

Simulator
summary.history

Summary Method for Contextual History
sum_of

Sum of list
inc<-

Increment
get_global_seed

Lookup .Random.seed in global environment
set_global_seed

Set .Random.seed to a pre-saved value
set_external

Change Default Graphing Device from RStudio
data_table_factors_to_numeric

Convert all factor columns in data.table to numeric
is_rstudio

Check if in RStudio
mvrnorm

Simulate from a Multivariate Normal Distribution
plot.history

Plot Method for Contextual History
value_remaining

Potential Value Remaining
print.history

Print Method for Contextual History
var_welford

Welford's variance