Batch Disjoint LinUCB Policy with Epsilon-Greedy
Batch Disjoint LinUCB Policy with Epsilon-Greedy
- `initialize(alpha = 1.0, epsilon = 0.1, batch_size = 1)`: Constructor. - `set_parameters(context_params)`: Initializes sufficient statistics for each arm. - `get_action(t, context)`: Selects an arm using UCB scores and epsilon-greedy rule. - `set_reward(t, context, action, reward)`: Updates statistics and refreshes model at batch intervals.
cramR::NA -> BatchLinUCBDisjointPolicyEpsilon
alphaNumeric, UCB exploration strength parameter.
epsilonNumeric, probability of taking a random exploratory action.
batch_sizeInteger, number of rounds per batch update.
A_ccList of Gram matrices per arm, accumulated across batch.
b_ccList of reward-weighted context vectors per arm.
class_nameInternal class name identifier.
Inherited methods
new()Constructor for batched LinUCB with epsilon-greedy exploration.
BatchLinUCBDisjointPolicyEpsilon$new(alpha = 1, epsilon = 0.1, batch_size = 1)alphaNumeric. UCB width parameter (exploration strength).
epsilonNumeric. Probability of selecting a random arm.
batch_sizeInteger. Number of rounds before updating parameters.
set_parameters()Initialize arm-specific parameter containers.
BatchLinUCBDisjointPolicyEpsilon$set_parameters(context_params)context_paramsList containing at least `unique` (feature size) and `k` (number of arms).
get_action()Chooses an arm based on UCB and epsilon-greedy sampling.
BatchLinUCBDisjointPolicyEpsilon$get_action(t, context)tInteger timestep.
contextList containing the context for the decision.
A list with the selected action.
set_reward()Updates arm-specific sufficient statistics based on observed reward. Parameter updates occur only at the end of a batch.
BatchLinUCBDisjointPolicyEpsilon$set_reward(t, context, action, reward)tInteger timestep.
contextContext object used for decision-making.
actionList containing the chosen action.
rewardList containing the observed reward.
Updated internal model parameters.
clone()The objects of this class are cloneable with this method.
BatchLinUCBDisjointPolicyEpsilon$clone(deep = FALSE)deepWhether to make a deep clone.
Implements the disjoint LinUCB algorithm with upper confidence bounds and epsilon-greedy exploration, using batched updates.