multiRL-package: multiRL: Reinforcement Learning Tools for Multi-Armed Bandit

Description

A flexible general-purpose toolbox for implementing Rescorla-Wagner models in multi-armed bandit tasks. As the successor and functional extension of the 'binaryRL' package, 'multiRL' modularizes the Markov Decision Process (MDP) into six core components. This framework enables users to construct custom models via intuitive if-else syntax and define latent learning rules for agents. For parameter estimation, it provides both likelihood-based inference (MLE and MAP) and simulation-based inference (ABC and RNN), with full support for parallel processing across subjects. The workflow is highly standardized, featuring four main functions that strictly follow the four-step protocol (and ten rules) proposed by Wilson & Collins (2019) tools:::Rd_expr_doi("10.7554/eLife.49547"). Beyond the three built-in models (TD, RSTD, and Utility), users can easily derive new variants by declaring which variables are treated as free parameters.

Arguments

Steps

run_m: Step 1: Building reinforcement learning model
rcv_d: Step 2: Generating fake data for parameter and model recovery
fit_p: Step 3: Optimizing parameters to fit real data
rpl_e: Step 4: Replaying the experiment with optimal parameters

Document

data: What kind of data structure the package actually accepts.
colnames: How to format your column names the right way.
behrule: How to define your latent learning rules.
funcs: These functions are the building blocks of your model.
params: A breakdown of every parameter used in the functions.
priors: Define the prior distributions for each free parameter.
settings: The general configuration and settings for your models.
policy: Decide if the agent chooses for itself (on-policy) or simply copies human behavior (off-policy).
estimate: Pick an estimation method (MLE, MAP, ABC, or RNN).
algorithm: The optimization algorithms used for likelihood-based inference.
control: Fine-tune how the estimation methods and algorithms behave.

Models

TD: Temporal Difference model
RSTD: Risk-Sensitive Temporal Difference model
Utility: Utility model

Functions

func_alpha: Learning Rate
func_beta: Inverse Temperature
func_gamma: Utility Function
func_delta: Upper-Confidence-Bound
func_epsilon: Exploration Functions
func_zeta: Working Memory System

Processes

process_1_input: Standardize all inputs into a structured S4 object.
process_2_behrule: Define the specific latent learning rules for the agent.
process_3_record: Initialize an empty container to track the MDP outputs.
process_4_output_cpp: C++ Version: Markov Decision Process.
process_4_output_r: R Version: Markov Decision Process.
process_5_metric: Compute various statistical metrics for different estimation methods.

Estimation

estimate_0_ENV: Estimation environment
estimate_1_LBI: Likelihood-Based Inference
estimate_1_MLE: Maximum Likelihood
estimate_1_MAP: Maximum A Posteriori
estimate_2_SBI: Simulation-Based Inference
estimate_2_ABC: Approximate Bayesian Computation
engine_ABC: The engine of ABC
estimate_2_RNN: Neural network estimation
engine_RNN: The engine of RNN
estimation_methods: Shell function of estimate

Datasets

TAB: Two-Armed Bandit data
MAB: Multi-Armed Bandit data

Summary

summary,multiRL.model-method: S4 method summary

Plot

plot.multiRL.replay: S3 method plot

Author

Maintainer: YuKi hmz1969a@gmail.com (ORCID)

Authors:

Xinyu xinyu000328@gmail.com (ORCID)