Qlearning: Q-learning

Description

This funciton impletment multiple stage Q-learning through backward propogation.

Usage

Qlearning(X,AA,RR,K,pentype="lasso",m=4)

Arguments

is either a matrix share among different stages, or list of feature matrix, row is sample, feature matrix of different stages can have different dimensions.

List of K, A[[i]] is the treatment assignment vector for stage i.

List of K, R[[i]] is the outcome vector for stage i.

number of stages

pentype

The type of regression implement in Q-learning, default is 'lasso', can be set to 'LSE'

number of folds of cross validation for in cv.glmnet in regression model if 'lasso' is selected

Value

it returns a list of K models with class 'qlearn'.

References

Watkins, C. J. C. H. (1989). Learning from delayed rewards (Doctoral dissertation, University of Cambridge). Murphy, S. A., Oslin, D. W., Rush, A. J., & Zhu, J. (2007). Methodological challenges in constructing effective treatment sequences for chronic psychiatric disorders. Neuropsychopharmacology, 32(2), 257-262. Zhao, Y., Kosorok, M. R., & Zeng, D. (2009). Reinforcement learning design for cancer clinical trials. Statistics in medicine, 28(26), 3294.

Examples

Run this code

n_cluster=10
pinfo=10
pnoise=20
example2=make_2classification(n_cluster,pinfo,pnoise,200)
test=make_2classification(n_cluster,pinfo,pnoise,200,example2$centroids)
pi=list()
pi[[2]]=pi[[1]]=rep(1,200)
modelQ=Qlearning(example2$X,example2$A,example2$R,2)