BatchLinUCBDisjointPolicyEpsilon: Batch Disjoint LinUCB Policy with Epsilon-Greedy

Description

Batch Disjoint LinUCB Policy with Epsilon-Greedy

Arguments

Methods

- `initialize(alpha = 1.0, epsilon = 0.1, batch_size = 1)`: Constructor. - `set_parameters(context_params)`: Initializes sufficient statistics for each arm. - `get_action(t, context)`: Selects an arm using UCB scores and epsilon-greedy rule. - `set_reward(t, context, action, reward)`: Updates statistics and refreshes model at batch intervals.

Super class

cramR::NA -> BatchLinUCBDisjointPolicyEpsilon

Public fields

alpha: Numeric, UCB exploration strength parameter.
epsilon: Numeric, probability of taking a random exploratory action.
batch_size: Integer, number of rounds per batch update.
A_cc: List of Gram matrices per arm, accumulated across batch.
b_cc: List of reward-weighted context vectors per arm.
class_name: Internal class name identifier.

Methods

Public methods

Inherited methods

Method `new()`

Constructor for batched LinUCB with epsilon-greedy exploration.

Usage

BatchLinUCBDisjointPolicyEpsilon$new(alpha = 1, epsilon = 0.1, batch_size = 1)

Arguments

alpha: Numeric. UCB width parameter (exploration strength).

epsilon

Numeric. Probability of selecting a random arm.

batch_size

Integer. Number of rounds before updating parameters.

Method `set_parameters()`

Initialize arm-specific parameter containers.

Usage

BatchLinUCBDisjointPolicyEpsilon$set_parameters(context_params)

Arguments

context_params: List containing at least `unique` (feature size) and `k` (number of arms).

Method `get_action()`

Chooses an arm based on UCB and epsilon-greedy sampling.

Usage

BatchLinUCBDisjointPolicyEpsilon$get_action(t, context)

Arguments

t: Integer timestep.

context

List containing the context for the decision.

Returns

A list with the selected action.

Method `set_reward()`

Updates arm-specific sufficient statistics based on observed reward. Parameter updates occur only at the end of a batch.

Usage

BatchLinUCBDisjointPolicyEpsilon$set_reward(t, context, action, reward)

Arguments

t: Integer timestep.

context

Context object used for decision-making.

action

List containing the chosen action.

reward

List containing the observed reward.

Returns

Updated internal model parameters.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

BatchLinUCBDisjointPolicyEpsilon$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Details

Implements the disjoint LinUCB algorithm with upper confidence bounds and epsilon-greedy exploration, using batched updates.

Description

Arguments

Methods

Super class

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Method set_parameters()

Usage

Arguments

Method get_action()

Usage

Arguments

Returns

Method set_reward()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Details

Method `new()`

Method `set_parameters()`

Method `get_action()`

Method `set_reward()`

Method `clone()`