TestPolicy: Backtesting Prescribed policy

Description

Backtesting prescribed policy.

Usage

TestPolicy(position, path, control, Reward, path_action)

Arguments

position

Natural number indicating the starting position.

path

3-dimensional array representing the generated paths. Array [i,j,] represents the state at time i for sample path j.

control

Array representing the transition probabilities of the controlled Markov chain. Two possible inputs:

Matrix of dimension n_pos $\times$ n_action, where entry [i,j] describes the next position after selecting action j at position i.
3-dimensional array with dimensions n_pos $\times$ n_action $\times$ n_pos, where entry [i,j,k] is the probability of moving to position k after applying action j to position i.

Reward

User supplied function to represent the rewards for all positions and actions. The function should take in the following arguments, in this order:

$n \times d$ matrix representing the $n$ $d$-dimensional states.
A natural number representing the decison epoch.

The function should output the following:

$n \times (p \times a)$ matrix representing the rewards, where $p$ is the number of positions and $a$ is the number of actions in the problem. The $[i, a \times (j - 1) + k]$-th entry of the matrix corresponds to the reward from applying the $k$-th action to the $j$-th position for the $i$-th state. The ordering of the positions and actions should coincide with the control matrix.

path_action

3-dimensional array representing the prescribed policy for the sample paths. Entry [i,j,k] captures the prescribed action at time i for position j on sample path k.

Value

Array containing the backtesting values for each sample path.

Examples

Run this code

## Bermuda put option
grid <- as.matrix(cbind(rep(1, 91), c(seq(10, 100, length = 91))))
disturb <- array(0, dim = c(2, 2, 10))
disturb[1,1,] <- 1
disturb[2,2,] <- exp((0.06 - 0.5 * 0.2^2) * 0.02 + 0.2 * sqrt(0.02) * rnorm(10))
disturb_weight <- rep(1 / 10, 10)
control <- matrix(c(c(1, 1), c(2, 1)), nrow = 2, byrow = TRUE)
reward <- array(0, dim = c(91, 2, 2, 2, 51))
reward[grid[,2] <= 4,1,2,2,] <- 40
reward[grid[,2] <= 4,2,2,2,] <- -1
r_index <- matrix(c(2, 2), ncol = 2)
bellman <- FastBellman(grid, reward, control, disturb, disturb_weight, r_index)
path_disturb <- array(0, dim = c(2, 2, 50, 100))
path_disturb[1,1,,] <- 1
path_disturb[2,2,,] <- exp((0.06 - 0.5 * 0.2^2) * 0.02 + 0.2 * sqrt(0.02) * rnorm(5000))
start <- c(1, 36)
path <- Path(start, path_disturb)
path_nn <- Neighbour(matrix(path, ncol = 2), grid, 1, "kdtree", 0, 1)$indices
RewardFunc <- function(state, time) {
    output <- array(data = 0, dim = c(nrow(state), 4))
    output[,4] <- pmax(40 - state[,2], 0)
    return(output)
}
path_action <- PathPolicy(path, path_nn, control, RewardFunc, bellman$expected)
test <- TestPolicy(2, path, control, RewardFunc, path_action)

Run the code above in your browser using DataLab