LinUCB Disjoint Policy with Epsilon-Greedy Exploration
LinUCB Disjoint Policy with Epsilon-Greedy Exploration
- `initialize(alpha = 1.0, epsilon = 0.1)`: Create a new LinUCBDisjointPolicyEpsilon object. - `set_parameters(context_params)`: Initialize arm-level parameters. - `get_action(t, context)`: Selects an arm using epsilon-greedy UCB. - `set_reward(t, context, action, reward)`: Updates internal statistics based on observed reward.
cramR::NA -> LinUCBDisjointPolicyEpsilon
alphaNumeric, exploration parameter controlling the width of the confidence bound.
epsilonNumeric, probability of selecting a random action (exploration).
class_nameInternal class name.
Inherited methods
new()Initializes the policy with UCB parameter alpha and exploration rate epsilon.
LinUCBDisjointPolicyEpsilon$new(alpha = 1, epsilon = 0.1)alphaNumeric. Controls width of the UCB bonus.
epsilonNumeric between 0 and 1. Probability of random action selection.
set_parameters()Set arm-specific parameter structures.
LinUCBDisjointPolicyEpsilon$set_parameters(context_params)context_paramsA list with context information, typically including the number of unique features.
get_action()Selects an arm using epsilon-greedy Upper Confidence Bound (UCB).
LinUCBDisjointPolicyEpsilon$get_action(t, context)tInteger time step.
contextA list with contextual features and number of arms.
A list containing the selected action.
set_reward()Updates internal statistics using the observed reward for the selected arm.
LinUCBDisjointPolicyEpsilon$set_reward(t, context, action, reward)tInteger time step.
contextContextual features for all arms at time t.
actionA list containing the chosen arm.
rewardA list containing the observed reward for the selected arm.
Updated internal parameters.
clone()The objects of this class are cloneable with this method.
LinUCBDisjointPolicyEpsilon$clone(deep = FALSE)deepWhether to make a deep clone.
Implements the disjoint LinUCB algorithm with upper confidence bounds and epsilon-greedy exploration.