Batch Contextual Epsilon-Greedy Policy
Batch Contextual Epsilon-Greedy Policy
cramR::NA -> BatchContextualEpsilonGreedyPolicy
epsilonProbability of selecting a random arm (exploration rate).
batch_sizeNumber of rounds per batch before updating model parameters.
A_ccList of Gram matrices (one per arm), used to accumulate sufficient statistics across batches.
b_ccList of reward-weighted context sums (one per arm), updated batch-wise.
class_nameInternal class name identifier.
Inherited methods
new()Constructor for the Batch Epsilon-Greedy policy.
BatchContextualEpsilonGreedyPolicy$new(epsilon = 0.1, batch_size = 1)epsilonNumeric between 0 and 1. Probability of random arm selection.
batch_sizeInteger. Number of observations between parameter updates.
set_parameters()Initializes the parameter structures for each arm.
BatchContextualEpsilonGreedyPolicy$set_parameters(context_params)context_paramsA list with at least `d` (number of features) and `k` (number of arms).
get_action()Chooses an arm based on epsilon-greedy logic and the current estimates.
BatchContextualEpsilonGreedyPolicy$get_action(t, context)tInteger time step.
contextA list with contextual features and arm count.
A list with the selected action.
set_reward()Updates model statistics based on observed reward. Updates occur once per batch.
BatchContextualEpsilonGreedyPolicy$set_reward(t, context, action, reward)tInteger time step.
contextList of contextual features used for the action.
actionA list with the chosen arm.
rewardA list with the observed reward.
Updated parameter estimates.
clone()The objects of this class are cloneable with this method.
BatchContextualEpsilonGreedyPolicy$clone(deep = FALSE)deepWhether to make a deep clone.
Implements an epsilon-greedy exploration strategy for contextual bandits with batched updates.