sparsereg: Sparse regression for experimental and observational data.

Description

Function for fitting a Bayesian LASSOplus model for fitting sparse models with uncertainty, facilitating the discovery of various types of interactions. Function takes a dependent variable, an optional matrix of (pre-treatment) covariates, and a (optional) matrix of categorical treatment variables. Includes correct calculation of uncertainty estimates, including for data with repeated observations.

Usage

sparsereg(y, X, treat=NULL, gibbs=200, burnin=200, thin=10, type="linear",  
group=NULL, trunc=NULL, lower.trunc=NULL, one.constraint=FALSE,  
scale.type="none", baseline.vec=NULL, 
id=NULL, id2=NULL, id3=NULL, save.temp=FALSE)

Arguments

Dependent variable.

Covariates. Typical vocabulary would refer to these as "pre-treatment" covariates.

treat

Matrix of categorical treatment variables. May be a matrix with one column in the case of there being only one treatment variable.

gibbs

Number of posterior samples to save. Between each saved sample, thin samples are drawn.

burnin

Number of burnin samples. Between each burnin sample, thin samples are drawn. These iterations will not be included in the resulting analysis.

thin

Extent of thinning of the MCMC chain. Between each posterior sample, whether burnin or saved, thin draws are made.

type

Type of statistical model to be fit. Options include type="linear" for linear regression, type="probit" for a probit regression, and type="tobit" for regression with censoring.

group

A vector of integers characterizing which covariates are under the same LASSO constraint. For example, the vector c(1,1,2,2,2) will place the first two covariates under one LASSO constraint and the last three under another. Only works for type="non

trunc

Whether or not the dependent variable is truncated.

lower.trunc

Lower value of truncation of dependent variable. Only value for type="tobit"

one.constraint

Whether to fit a single LASSO constraint.

baseline.vec

Optional vector with one entry for each column of the treatment matrix. Each entry gives the baseline condition for that treatment, which then during pre-processing is omitted for estimation so it serves as an excluded category.

id, id2, id3

Vectors the same lenght of the sample denoting clustering in the data. In a conjoint experiment with repeated observations, these correspond with respondent IDs. Up to three different sets of random effects are allowed.

scale.type

Indicates the types of interactions that will be created and used in estimation. scale.type="none" generates no interactions and corresponds to simply running LASSOplus with no interactions between variables. scale.type="TX" creates

save.temp

Whether to save intermediate output in a file named temp_sparsereg. Useful for very long runs.

Value

beta.modeMatrix of sparse (mode) estimates with rows equal to number of effects and columns for posterior samples.
beta.meanMatrix of mean estimates with rows equal to number of effects and columns for posterior samples. These estimates are not sparse, but they do predict better than the mode.
beta.ciMatrix of effects used to calculate approximate confidence intervals.
sigma.sqVector of posterior estimate of error variance.
XMatrix of covariates fit. Includes interaction terms, depending on scale.type.
varmatMatrix of showing which lower-order terms correspond with which effects. Used in producing figures.
baselineVector of baseline categories for treatments.
modeltypeType of sparsereg model fit. In this case, onestage. Used by summary functions.

Details

The function sparsereg allows for estimation of a broad range of sparse regressions. The method allows for continuous, binary, and censored outcomes. In experimental data, it can be used for subgroup analysis. It pre-processes lower-order terms to generate higher-order interactions terms that are uncorrelated with their lower order component, with pre-processing generated through scale.type. In observational data, it can be used in place of a standard regression, especially in the presence of a large number of variables. The method also adjusts uncertainty estimates when there are repeated observations through using random effects. For example, a conjoint design may have the same people make several comparisons, or a panel data regression may have multiple observations on the same unit. The object contains the estimated posterior for all of the modeled effects, and analyzing the object is facilitated by the functions plot, summary, volcanoplot, and difference.

References

Ratkovic, Marc and Tingley, Dustin. 2015. "Sparse Estimation with Uncertainty: Subgroup Analysis in Large Dimensional Design." Working paper. Egami, Naoki and Imai, Kosuke. 2015. "Causal Interaction in High-Dimension." Working paper.

Examples

Run this code

set.seed(1)
 n<-500
 k<-2
 treat<-sample(c("a","b","c"),n,replace=TRUE,pr=c(.5,.25,.25))
 Sigma<-matrix(c(1,.5,.5,1),nr=2)
 X<-mvrnorm(n,m=c(0,0),S=Sigma)
 y.true<-3+X[,2]*2+(treat=="a")*2 +(treat=="b")*(-2)+X[,2]*(treat=="b")*(-2)+
  X[,2]*(treat=="c")*2
 y<-y.true+rnorm(n)

##Fit a linear model.
s1<-sparsereg(y,X,treat,scale.type="TX")

##Extension using a baseline category
s1.base<-sparsereg(y,X,treat,scale.type="TX",baseline.vec="a")
ests.base<-apply(s1.base$beta.mode,1,median)

##Extension using a probit
s1.probit<-sparsereg((y>5),X,treat,scale.type="TX",type="probit")
ests.probit<-apply(s1.probit$beta.mode,1,median)

Run the code above in your browser using DataLab