BatchContextualEpsilonGreedyPolicy: Batch Contextual Epsilon-Greedy Policy

Description

Batch Contextual Epsilon-Greedy Policy

Arguments

Super class

cramR::NA -> BatchContextualEpsilonGreedyPolicy

Public fields

epsilon: Probability of selecting a random arm (exploration rate).
batch_size: Number of rounds per batch before updating model parameters.
A_cc: List of Gram matrices (one per arm), used to accumulate sufficient statistics across batches.
b_cc: List of reward-weighted context sums (one per arm), updated batch-wise.
class_name: Internal class name identifier.

Methods

Public methods

Inherited methods

Method `new()`

Constructor for the Batch Epsilon-Greedy policy.

Usage

BatchContextualEpsilonGreedyPolicy$new(epsilon = 0.1, batch_size = 1)

Arguments

epsilon: Numeric between 0 and 1. Probability of random arm selection.

batch_size

Integer. Number of observations between parameter updates.

Method `set_parameters()`

Initializes the parameter structures for each arm.

Usage

BatchContextualEpsilonGreedyPolicy$set_parameters(context_params)

Arguments

context_params: A list with at least `d` (number of features) and `k` (number of arms).

Method `get_action()`

Chooses an arm based on epsilon-greedy logic and the current estimates.

Usage

BatchContextualEpsilonGreedyPolicy$get_action(t, context)

Arguments

t: Integer time step.

context

A list with contextual features and arm count.

Returns

A list with the selected action.

Method `set_reward()`

Updates model statistics based on observed reward. Updates occur once per batch.

Usage

BatchContextualEpsilonGreedyPolicy$set_reward(t, context, action, reward)

Arguments

t: Integer time step.

context

List of contextual features used for the action.

action

A list with the chosen arm.

reward

A list with the observed reward.

Returns

Updated parameter estimates.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

BatchContextualEpsilonGreedyPolicy$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Details

Implements an epsilon-greedy exploration strategy for contextual bandits with batched updates.

Description

Arguments

Super class

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Method set_parameters()

Usage

Arguments

Method get_action()

Usage

Arguments

Returns

Method set_reward()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Details

Method `new()`

Method `set_parameters()`

Method `get_action()`

Method `set_reward()`

Method `clone()`