Batch Contextual Thompson Sampling Policy
Batch Contextual Thompson Sampling Policy
- `initialize(v = 0.2, batch_size = 1)`: Constructor, sets variance and batch size. - `set_parameters(context_params)`: Initializes arm-level matrices. - `get_action(t, context)`: Samples from the posterior and selects action. - `set_reward(t, context, action, reward)`: Updates posterior statistics using observed feedback.
cramR::NA -> BatchContextualLinTSPolicy
sigmaNumeric, posterior variance scale parameter.
batch_sizeInteger, size of mini-batches before parameter updates.
A_ccList of accumulated Gram matrices per arm.
b_ccList of reward-weighted context sums per arm.
class_nameInternal name of the class.
Inherited methods
new()Constructor for the batch-based Thompson Sampling policy.
BatchContextualLinTSPolicy$new(v = 0.2, batch_size = 1)vNumeric. Standard deviation scaling parameter for posterior sampling.
batch_sizeInteger. Number of rounds before parameters are updated.
set_parameters()Initializes per-arm sufficient statistics.
BatchContextualLinTSPolicy$set_parameters(context_params)context_paramsList with entries: `unique` (feature vector), `k` (number of arms).
get_action()Samples from the posterior distribution of expected rewards and selects an action.
BatchContextualLinTSPolicy$get_action(t, context)tInteger. Time step.
contextList containing the current context and arm information.
A list with the chosen arm (`choice`).
set_reward()Updates Gram matrix and response vector for the chosen arm. Parameters are refreshed every `batch_size` rounds.
BatchContextualLinTSPolicy$set_reward(t, context, action, reward)tInteger. Time step.
contextContext object containing feature info.
actionChosen action (arm index).
rewardObserved reward for the action.
Updated internal parameters.
clone()The objects of this class are cloneable with this method.
BatchContextualLinTSPolicy$clone(deep = FALSE)deepWhether to make a deep clone.
Implements Thompson Sampling for linear contextual bandits with batch updates.