This is a program conducting single stage Q-learning. Q-learning select optimal treatment option by fitting a regression model with treatment, feature variable and their interactions.The optimal treatment option is the the sign of the interaction term which maximize the predicted value from the regression.
Usage
Qlearning_Single(H, A, R, pentype = "lasso",m=4)
Arguments
H
a n by p matrix, n is the sample size, p is the number of feature variables.
A
a vector of n entries coded 1 and -1 for the treatment assignments
R
The vector of outcome variable, larger is more desirable.
pentype
The type of regression in Q-learning, 'lasso' is the default, using lasso regression; 'LSE' is the ordianry least square.
m
If pentype='lasso', the number of folds in cross validation for picking tuning parameter for lasso in cv.glmnet
Value
It returns a class of 'qlearn', that consists of two components:
cothe coefficient of the regression model, it is a 2p+2 vector. The design matrix X=(Intercept, H, A, diag(A)*H)
QThe predicted optimal outcome from the regression model
References
Watkins, C. J. C. H. (1989). Learning from delayed rewards (Doctoral dissertation, University of Cambridge).
Murphy, S. A., Oslin, D. W., Rush, A. J., & Zhu, J. (2007). Methodological challenges in constructing effective treatment sequences for chronic psychiatric disorders. Neuropsychopharmacology, 32(2), 257-262.
Zhao, Y., Kosorok, M. R., & Zeng, D. (2009). Reinforcement learning design for cancer clinical trials. Statistics in medicine, 28(26), 3294.