Finds the optimal (maximizing the sum of rewards) depth L tree by exhaustive search. If the optimal
action is the same in both the left and right leaf of a node, the node is pruned.
Usage
policy_tree(X, Gamma, depth = 2, split.step = 1)
Arguments
X
The covariates used. Dimension \(Np\) where \(p\) is the number of features.
Gamma
The rewards for each action. Dimension \(Nd\) where \(d\) is the number of actions.
depth
The depth of the fitted tree. Default is 2.
split.step
An optional approximation parameter (integer above zero), the number of possible splits
to consider when performing tree search. split.step = 1 (default) considers every possible split, split.step = 10
considers splitting at every 10'th distinct value and will yield a substantial speedup for densely packed
continuous data.
Value
A policy_tree object.
References
Zhou, Zhengyuan, Susan Athey, and Stefan Wager. "Offline multi-action policy learning:
Generalization and optimization." arXiv preprint arXiv:1810.04778 (2018).
# NOT RUN {n <- 50
p <- 10
d <- 3
features <- matrix(rnorm(n * p), n, p)
rewards <- matrix(rnorm(n * d), n, d)
tree <- policy_tree(features, rewards, depth = 2)
tree
# }