Learn R Programming

sparsereg (version 1.0)

sparsereg: Sparse regression for experimental and observational data.

Description

Function for fitting a Bayesian LASSOplus model for fitting sparse models with uncertainty, facilitating the discovery of various types of interactions. Function takes a dependent variable, an optional matrix of (pre-treatment) covariates, and a (optional) matrix of categorical treatment variables. Includes correct calculation of uncertainty estimates, including for data with repeated observations.

Usage

sparsereg(y, X, treat=NULL, gibbs=200, burnin=200, thin=10, type="linear",  
group=NULL, trunc=NULL, lower.trunc=NULL, one.constraint=FALSE,  
scale.type="none", baseline.vec=NULL, 
id=NULL, id2=NULL, id3=NULL, save.temp=FALSE)

Arguments

y
Dependent variable.
X
Covariates. Typical vocabulary would refer to these as "pre-treatment" covariates.
treat
Matrix of categorical treatment variables. May be a matrix with one column in the case of there being only one treatment variable.
gibbs
Number of posterior samples to save. Between each saved sample, thin samples are drawn.
burnin
Number of burnin samples. Between each burnin sample, thin samples are drawn. These iterations will not be included in the resulting analysis.
thin
Extent of thinning of the MCMC chain. Between each posterior sample, whether burnin or saved, thin draws are made.
type
Type of statistical model to be fit. Options include type="linear" for linear regression, type="probit" for a probit regression, and type="tobit" for regression with censoring.
group
A vector of integers characterizing which covariates are under the same LASSO constraint. For example, the vector c(1,1,2,2,2) will place the first two covariates under one LASSO constraint and the last three under another. Only works for type="non
trunc
Whether or not the dependent variable is truncated.
lower.trunc
Lower value of truncation of dependent variable. Only value for type="tobit"
one.constraint
Whether to fit a single LASSO constraint.
baseline.vec
Optional vector with one entry for each column of the treatment matrix. Each entry gives the baseline condition for that treatment, which then during pre-processing is omitted for estimation so it serves as an excluded category.
id, id2, id3
Vectors the same lenght of the sample denoting clustering in the data. In a conjoint experiment with repeated observations, these correspond with respondent IDs. Up to three different sets of random effects are allowed.
scale.type
Indicates the types of interactions that will be created and used in estimation. scale.type="none" generates no interactions and corresponds to simply running LASSOplus with no interactions between variables. scale.type="TX" creates
save.temp
Whether to save intermediate output in a file named temp_sparsereg. Useful for very long runs.

Value

  • beta.modeMatrix of sparse (mode) estimates with rows equal to number of effects and columns for posterior samples.
  • beta.meanMatrix of mean estimates with rows equal to number of effects and columns for posterior samples. These estimates are not sparse, but they do predict better than the mode.
  • beta.ciMatrix of effects used to calculate approximate confidence intervals.
  • sigma.sqVector of posterior estimate of error variance.
  • XMatrix of covariates fit. Includes interaction terms, depending on scale.type.
  • varmatMatrix of showing which lower-order terms correspond with which effects. Used in producing figures.
  • baselineVector of baseline categories for treatments.
  • modeltypeType of sparsereg model fit. In this case, onestage. Used by summary functions.

Details

The function sparsereg allows for estimation of a broad range of sparse regressions. The method allows for continuous, binary, and censored outcomes. In experimental data, it can be used for subgroup analysis. It pre-processes lower-order terms to generate higher-order interactions terms that are uncorrelated with their lower order component, with pre-processing generated through scale.type. In observational data, it can be used in place of a standard regression, especially in the presence of a large number of variables. The method also adjusts uncertainty estimates when there are repeated observations through using random effects. For example, a conjoint design may have the same people make several comparisons, or a panel data regression may have multiple observations on the same unit. The object contains the estimated posterior for all of the modeled effects, and analyzing the object is facilitated by the functions plot, summary, volcanoplot, and difference.

References

Ratkovic, Marc and Tingley, Dustin. 2015. "Sparse Estimation with Uncertainty: Subgroup Analysis in Large Dimensional Design." Working paper. Egami, Naoki and Imai, Kosuke. 2015. "Causal Interaction in High-Dimension." Working paper.

See Also

plot.sparsereg, summary.sparsereg, volcanoplot, difference, print.sparsereg

Examples

Run this code
set.seed(1)
 n<-500
 k<-2
 treat<-sample(c("a","b","c"),n,replace=TRUE,pr=c(.5,.25,.25))
 Sigma<-matrix(c(1,.5,.5,1),nr=2)
 X<-mvrnorm(n,m=c(0,0),S=Sigma)
 y.true<-3+X[,2]*2+(treat=="a")*2 +(treat=="b")*(-2)+X[,2]*(treat=="b")*(-2)+
  X[,2]*(treat=="c")*2
 y<-y.true+rnorm(n)

##Fit a linear model.
s1<-sparsereg(y,X,treat,scale.type="TX")

##Extension using a baseline category
s1.base<-sparsereg(y,X,treat,scale.type="TX",baseline.vec="a")
ests.base<-apply(s1.base$beta.mode,1,median)

##Extension using a probit
s1.probit<-sparsereg((y>5),X,treat,scale.type="TX",type="probit")
ests.probit<-apply(s1.probit$beta.mode,1,median)

Run the code above in your browser using DataLab