Zhao, Y., Zeng, D., Rush, A. J., & Kosorok, M. R. (2012). Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107(499), 1106-1118. Zhao, Y. Q., Zeng, D., Laber, E. B., & Kosorok, M. R. (2014). New statistical learning methods for estimating optimal dynamic treatment regimes. Journal of the American Statistical Association, (just-accepted), 00-00. Watkins, C. J. C. H. (1989). Learning from delayed rewards (Doctoral dissertation, University of Cambridge). Murphy, S. A., Oslin, D. W., Rush, A. J., & Zhu, J. (2007). Methodological challenges in constructing effective treatment sequences for chronic psychiatric disorders. Neuropsychopharmacology, 32(2), 257-262. Zhao, Y., Kosorok, M. R., & Zeng, D. (2009). Reinforcement learning design for cancer clinical trials. Statistics in medicine, 28(26), 3294.