sim_pte
Simulations for Personalized Treatment Effects
Numerical simulation for treatment effect heterogeneity estimation as described in Tian et al. (2012)
- Keywords
- uplift, personalized treatment learning
Usage
sim_pte(n = 1000, p = 20, rho = 0, sigma = sqrt(2), beta.den = 4)
Arguments
- n
- number of observations.
- p
- number of predictors.
- rho
- covariance between predictors.
- sigma
- multiplier of error term.
- beta.den
- size of main effects relative to interaction effects. See details.
Details
sim_pte
simulates data according to the following specification:
$$Y = I(\sum_{j=1}^p \beta_{j}X_{j} + \sum_{j=1}^p \gamma_{j}X_{j}T +\sigma_{0}\epsilon > 0)$$,
where $\gamma=(1/2,-1/2,1/2,-1/2, 0,...,0)$, $\beta=(-1)^{j+1}I(3 \leq j \leq 10) / \code{beta.den}$, $(X_{1}, \ldots, X_{p})$ follows a mean zero multivariate normal distribution with a compound symmetric variance-covariance matrix, $(1-\rho)\mathbf{I}_{p} +\rho \mathbf{1}^{T}\mathbf{1}$, $T=[-1,1]$ is the treatment indicator and $\epsilon$ is $N(0,1)$.
In this case, the "true" treatment effect score $(Prob(Y=1|T=1) - Prob(Y=1|T=-1))$ is given by
$$\Phi (\frac{\sum_{j=1}^p (\beta_{j} + \gamma_{j})X_{j}}{\sigma_{0}}) - \Phi (\frac{\sum_{j=1}^p (\beta_{j} - \gamma_{j})X_{j}}{\sigma_{0}})$$.
Value
- A data frame including the response variable ($Y$), the treatment (
treat=1
) and control (treat=-1
) assignment, the predictor variables ($X$) and the "true" treatment effect score (ts
)
References
Tian, L., Alizadeh, A., Gentles, A. and Tibshirani, R. 2012. A simple method for detecting interactions between a treatment and a large number of covariates. Submitted on Dec 2012. arXiv:1212.2995 [stat.ME].
Guelman, L., Guillen, M., and Perez-Marin A.M. (2013). Optimal personalized treatment rules for marketing interventions: A review of methods, a new proposal, and an insurance case study. Submitted.
Examples
library(uplift)
### Simulate train data
set.seed(12345)
dd <- sim_pte(n = 1000, p = 10, rho = 0, sigma = sqrt(2), beta.den = 4)
dd$treat <- ifelse(dd$treat == 1, 1, 0) # required coding for upliftRF
### Fit model
form <- as.formula(paste('y ~', 'trt(treat) +', paste('X', 1:10, sep = '', collapse = "+")))
fit1 <- upliftRF(formula = form,
data = dd,
ntree = 100,
split_method = "Int",
interaction.depth = 3,
minsplit = 100,
minbucket_ct0 = 50,
minbucket_ct1 = 50,
verbose = TRUE)
summary(fit1)