Learn R Programming

SIHR

The package SIHR(Statistics Inference for High-dimensional Regression) facilitates statistical inference in high-dimensional generalized linear models (GLM) with continuous and binary outcomes. It offers tools to construct confidence intervals and to conduct hypothesis testing for low-dimensional objectives (e.g. $e_1^\intercal \beta$) in both one-sample and two-sample regression regimes.

Background

In numerous scenarios, regression problems often involve a number of dimensions $p$ surpassing the sample size $n$. Traditional estimators, derived through penalized maximum likelihood methods like Lasso and Ridge, are unsuitable for statistical inference due to significant estimation bias introduced by the penalty term. Our package leverages debiasing methods, addressing a broad spectrum of inference challenges in high-dimensional GLMs, applicable to both continuous and binary outcomes.

To demonstrate the effect of bias correction, we conducted 250 simulation rounds. The data generation process is defined with $n=p=200$: for $1\leq i\leq n$, the covariates follows $X_i\sim \mathcal{N}({0}p, \mathbf{I}p)$, the outcome follows $Y_i = X_i^\intercal \beta + \mathcal{N}(0,1)$, where $\beta= (0.5{5}, 0.2, 0.4, 0.6, 0.8, 1, {0}{p-10})$.

Our objective was to conduct inference on the first coefficient $e_1^\intercal \beta = 0.5$. However, as illustrated in the left subfigure, Lasso estimators from the glmnet package exhibit considerable bias. Conversely, our SIHR estimators, shown in the right subfigure, are unbiased and thus ready for valid statistical inference.

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("zywang0701/SIHR")

Getting Started

The package consists of 5 main functions in one or two samples regime, with different low-dimensional objectives, respectively. Having fitted the model, it allows for two methods to report the result.

We have prepared several vignettes to help users get ready with SIHR:

guide for an initial usage of the package.

  • For an in-depth exploration of the package's functionality,

consult the Intro of Usage.

  • To gain insights into the debiasing methods incorporated within the package,

refer to our Intro of Debiasing Methods.

Copy Link

Version

Install

install.packages('SIHR')

Monthly Downloads

221

Version

2.1.0

License

GPL-3

Maintainer

Zijian Guo

Last Published

April 24th, 2024

Functions in SIHR (2.1.0)

summary.QF

Summarizing QF
print.summary.Dist

Printing summarizing Dist
print.summary.InnProd

Printing summarizing InnProd
ci.LF

Confidence Intervals for Bias-corrected LF Estimators
LF

Inference for linear combination of the regression vector in high dimensional generalized linear regression
ci.Dist

Confidence Intervals for Bias-corrected Dist Estimators
ci.CATE

Confidence Intervals for Bias-corrected CATE Estimators
ci.InnProd

Confidence Intervals for Bias-corrected InnProd Estimators
ci

Confidence Intervals for Bias-corrected Estimators
print.summary.LF

Printing Summarizing LF
InnProd

Inference for weighted inner product of the regression vectors in high dimensional generalized linear regressions
ci.QF

Confidence Intervals for Bias-corrected QF Estimators
Dist

Inference for weighted quadratic functional of difference of the regression vectors (excluding the intercept term) in high dimensional generalized linear regressions.
CATE

Inference for difference of linear combinations of the regression vectors in high dimensional generalized linear regressions
QF

Inference for quadratic forms of the regression vector in high dimensional generalized linear regressions
summary.CATE

Summarizing CATE
summary.Dist

Summarizing Dist
summary.InnProd

Summarizing InnProd
print.summary.CATE

Printing Summarizing CATE
print.summary.QF

Printing ummarizing QF
summary.LF

Summarizing LF