Qlearning_Single: Single Stage Q learning

Description

This is a program conducting single stage Q-learning. Q-learning select optimal treatment option by fitting a regression model with treatment, feature variable and their interactions.The optimal treatment option is the the sign of the interaction term which maximize the predicted value from the regression.

Usage

Qlearning_Single(H, A, R, pentype = "lasso",m=4)

Arguments

a n by p matrix, n is the sample size, p is the number of feature variables.

a vector of n entries coded 1 and -1 for the treatment assignments

The vector of outcome variable, larger is more desirable.

pentype

The type of regression in Q-learning, 'lasso' is the default, using lasso regression; 'LSE' is the ordianry least square.

If pentype='lasso', the number of folds in cross validation for picking tuning parameter for lasso in cv.glmnet

Value

It returns a class of 'qlearn', that consists of two components:
cothe coefficient of the regression model, it is a 2p+2 vector. The design matrix X=(Intercept, H, A, diag(A)*H)
QThe predicted optimal outcome from the regression model

References

Watkins, C. J. C. H. (1989). Learning from delayed rewards (Doctoral dissertation, University of Cambridge). Murphy, S. A., Oslin, D. W., Rush, A. J., & Zhu, J. (2007). Methodological challenges in constructing effective treatment sequences for chronic psychiatric disorders. Neuropsychopharmacology, 32(2), 257-262. Zhao, Y., Kosorok, M. R., & Zeng, D. (2009). Reinforcement learning design for cancer clinical trials. Statistics in medicine, 28(26), 3294.

Examples

Run this code

n=200
A=2*rbinom(n,1,0.5)-1
p=20
mu=numeric(p)
Sigma=diag(p)
X=mvrnorm(n,mu,Sigma)
R=X[,1:3]%*%c(1,1,-2)+X[,3:5]%*%c(1,1,-2)*A+rnorm(n)
modelQ=Qlearning_Single(X,A,R)