Contextual Linear Bandit Environment
Contextual Linear Bandit Environment
- `initialize(k, d, list_betas, sigma = 0.1, binary_rewards = FALSE)`: Constructor. - `post_initialization()`: Loads correct coefficients based on `sim_id`. - `get_context(t)`: Returns context and sets internal reward vector. - `get_reward(t, context_common, action)`: Returns observed reward for an action.
cramR::NA -> ContextualLinearBandit
rewardsA vector of rewards for each arm in the current round.
betasCoefficient matrix of the linear reward model (one column per arm).
sigmaStandard deviation of the Gaussian noise added to rewards.
binaryLogical, indicating whether to convert rewards into binary outcomes.
weightsThe latent reward scores before noise and/or binarization.
list_betasA list of coefficient matrices, one per simulation.
sim_idIndex for selecting which simulation's coefficients to use.
class_nameName of the class for internal tracking.
Inherited methods
new()ContextualLinearBandit$new(
k,
d,
list_betas,
sigma = 0.1,
binary_rewards = FALSE
)kNumber of arms
dNumber of features
list_betasA list of true beta matrices for each simulation
sigmaStandard deviation of Gaussian noise
binary_rewardsLogical, use binary rewards or not
post_initialization()Set the simulation-specific coefficients for the current simulation.
ContextualLinearBandit$post_initialization()No return value; modifies the internal state of the object.
get_context()ContextualLinearBandit$get_context(t)tCurrent time step
A list containing context vector `X` and arm count `k`
get_reward()ContextualLinearBandit$get_reward(t, context_common, action)tCurrent time step
context_commonContext shared across arms
actionAction taken by the policy
A list with reward and optimal arm/reward info
clone()The objects of this class are cloneable with this method.
ContextualLinearBandit$clone(deep = FALSE)deepWhether to make a deep clone.
An R6 class for simulating a contextual linear bandit environment with normally distributed rewards.