boottest.lm: Fast wild cluster bootstrap inference for object of class lm

Description

boottest.lm is a S3 method that allows for fast wild cluster bootstrap inference for objects of class lm by implementing the fast wild bootstrap algorithm developed in Roodman et al., 2019.

Usage

# S3 method for lm
boottest(
  object,
  param,
  B,
  clustid = NULL,
  bootcluster = "max",
  conf_int = TRUE,
  seed = NULL,
  R = NULL,
  r = 0,
  beta0 = NULL,
  sign_level = 0.05,
  type = "rademacher",
  impose_null = TRUE,
  p_val_type = "two-tailed",
  tol = 1e-06,
  maxiter = 10,
  nthreads = getBoottest_nthreads(),
  ssc = boot_ssc(adj = TRUE, fixef.K = "none", cluster.adj = TRUE, cluster.df =
    "conventional"),
  boot_algo = getBoottest_boot_algo(),
  floattype = "Float64",
  maxmatsize = FALSE,
  bootstrapc = FALSE,
  t_boot = FALSE,
  getauxweights = FALSE,
  ...
)

Arguments

object

An object of class lm

param

A character vector or rhs formula. The name of the regression coefficient(s) for which the hypothesis is to be tested

Integer. The number of bootstrap iterations. When the number of clusters is low, increasing B adds little additional runtime.

clustid

A character vector or rhs formula containing the names of the cluster variables. If NULL, a heteroskedasticity-robust (HC1) wild bootstrap is run.

bootcluster

A character vector or rhs formula of length 1. Specifies the bootstrap clustering variable or variables. If more than one variable is specified, then bootstrapping is clustered by the intersections of clustering implied by the listed variables. To mimic the behavior of stata's boottest command, the default is to cluster by the intersection of all the variables specified via the clustid argument, even though that is not necessarily recommended (see the paper by Roodman et al cited below, section 4.2). Other options include "min", where bootstrapping is clustered by the cluster variable with the fewest clusters. Further, the subcluster bootstrap (MacKinnon & Webb, 2018) is supported - see the vignette("fwildclusterboot", package = "fwildclusterboot") for details.

conf_int

A logical vector. If TRUE, boottest computes confidence intervals by test inversion. If FALSE, only the p-value is returned.

seed

An integer. Allows to set a random seed. For details, see below.

Hypothesis Vector giving linear combinations of coefficients. Must be either NULL or a vector of the same length as param. If NULL, a vector of ones of length param.

A numeric. Shifts the null hypothesis H0: param = r vs H1: param != r

beta0

Deprecated function argument. Replaced by function argument 'r'.

sign_level

A numeric between 0 and 1 which sets the significance level of the inference procedure. E.g. sign_level = 0.05 returns 0.95% confidence intervals. By default, sign_level = 0.05.

type

character or function. The character string specifies the type of boostrap to use: One of "rademacher", "mammen", "norm" and "webb". Alternatively, type can be a function(n) for drawing wild bootstrap factors. "rademacher" by default. For the Rademacher distribution, if the number of replications B exceeds the number of possible draw ombinations, 2^(#number of clusters), then boottest() will use each possible combination once (enumeration).

impose_null

Logical. Controls if the null hypothesis is imposed on the bootstrap dgp or not. Null imposed (WCR) by default. If FALSE, the null is not imposed (WCU)

p_val_type

Character vector of length 1. Type of p-value. By default "two-tailed". Other options include "equal-tailed", ">" and "<".

tol

Numeric vector of length 1. The desired accuracy (convergence tolerance) used in the root finding procedure to find the confidence interval. 1e-6 by default.

maxiter

Integer. Maximum number of iterations used in the root finding procedure to find the confidence interval. 10 by default.

nthreads

The number of threads. Can be: a) an integer lower than, or equal to, the maximum number of threads; b) 0: meaning all available threads will be used; c) a number strictly between 0 and 1 which represents the fraction of all threads to use. The default is to use 1 core.

ssc

An object of class boot_ssc.type obtained with the function boot_ssc. Represents how the small sample adjustments are computed. The defaults are adj = TRUE, fixef.K = "none", cluster.adj = "TRUE", cluster.df = "conventional". You can find more details in the help file for boot_ssc(). The function is purposefully designed to mimic fixest's ssc function.

boot_algo

Character scalar. Either "R" or "WildBootTests.jl". Controls the algorithm employed by boottest(). "R" is the default and implements the cluster bootstrap as in Roodman (2019). "WildBootTests.jl" executes the wild cluster bootstrap via the WildBootTests.jl package. For it to run, Julia and WildBootTests.jl need to be installed. Note that if no cluster is provided, boottest() always defaults to the "lean" algorithm. You can set the employed algorithm globally by using the setBoottest_boot_algo() function.

floattype

Float64 by default. Other option: Float32. Should floating point numbers in Julia be represented as 32 or 64 bit? Only relevant when 'boot_algo = "WildBootTests.jl"'

maxmatsize

NULL by default = no limit. Else numeric scalar to set the maximum size of auxilliary weight matrix (v), in gigabytes. Only relevant when 'boot_algo = "WildBootTests.jl"'

bootstrapc

Logical scalar, FALSE by default. TRUE to request bootstrap-c instead of bootstrap-t. Only relevant when 'boot_algo = "WildBootTests.jl"'

t_boot

Logical. Should bootstrapped t-statistics be returned?

getauxweights

Logical. Whether to save auxilliary weight matrix (v)

...

Further arguments passed to or from other methods.

Value

An object of class boottest

p_val

The bootstrap p-value.

conf_int

The bootstrap confidence interval.

param

The tested parameter.

Sample size. Might differ from the regression sample size if the cluster variables contain NA values.

boot_iter

Number of Bootstrap Iterations.

clustid

Names of the cluster Variables.

N_G

Dimension of the cluster variables as used in boottest.

sign_level

Significance level used in boottest.

type

Distribution of the bootstrap weights.

impose_null

Whether the null was imposed on the bootstrap dgp or not.

The vector "R" in the null hypothesis of interest Rbeta = r.

The scalar "r" in the null hypothesis of interest Rbeta = r.

point_estimate

R'beta. A scalar: the constraints vector times the regression coefficients.

grid_vals

All t-statistics calculated while calculating the confidence interval.

p_grid_vals

All p-values calculated while calculating the confidence interval.

t_stat

The 'original' regression test statistics.

t_boot

All bootstrap t-statistics.

regression

The regression object used in boottest.

call

Function call of boottest.

boot_algo

The employed bootstrap algorithm.

nthreads

The number of threads employed.

internal_seed

The integer value -inherited from set.seed() - used within boottest() to set the random seed in either R or Julia. If NULL, no internal seed was created.

Setting Seeds

To guarantee reproducibility, you can either use boottest()'s seed function argument, or set a global random seed via

set.seed() when using
1. the lean algorithm (via boot_algo = "R-lean") including the heteroskedastic wild bootstrap
2. the wild cluster bootstrap via boot_algo = "R" with Mammen weights or
3. boot_algo = "WildBootTests.jl"
dqrng::dqset.seed() when using boot_algo = "R" for Rademacher, Webb or Normal weights

Confidence Intervals

boottest computes confidence intervals by inverting p-values. In practice, the following procedure is used:

Based on an initial guess for starting values, calculate p-values for 26 equal spaced points between the starting values.
Out of the 26 calculated p-values, find the two pairs of values x for which the corresponding p-values px cross the significance level sign_level.
Feed the two pairs of x into an numerical root finding procedure and solve for the root. boottest currently relies on stats::uniroot and sets an absolute tolerance of 1e-06 and stops the procedure after 10 iterations.

Standard Errors

boottest does not calculate standard errors.

References

Roodman et al., 2019, "Fast and wild: Bootstrap inference in STATA using boottest", The STATA Journal. (https://journals.sagepub.com/doi/full/10.1177/1536867X19830877)

Cameron, A. Colin, Jonah B. Gelbach, and Douglas L. Miller. "Bootstrap-based improvements for inference with clustered errors." The Review of Economics and Statistics 90.3 (2008): 414-427.

MacKinnon, James G., and Matthew D. Webb. "The wild bootstrap for few (treated) clusters." The Econometrics Journal 21.2 (2018): 114-135.

MacKinnon, James. "Wild cluster bootstrap confidence intervals." L'Actualite economique 91.1-2 (2015): 11-33.

Webb, Matthew D. Reworking wild bootstrap based inference for clustered errors. No. 1315. Queen's Economics Department Working Paper, 2013.

Examples

Run this code

# NOT RUN {
library(fwildclusterboot)
data(voters)
lm_fit <- lm(proposition_vote ~ treatment + ideology1 + log_income + Q1_immigration,
  data = voters
)
boot1 <- boottest(lm_fit,
  B = 9999,
  param = "treatment",
  clustid = "group_id1"
)
boot2 <- boottest(lm_fit,
  B = 9999,
  param = "treatment",
  clustid = c("group_id1", "group_id2")
)
boot3 <- boottest(lm_fit,
  B = 9999,
  param = "treatment",
  clustid = c("group_id1", "group_id2"),
  sign_level = 0.2,
  seed = 8,
  r = 2
)
# test treatment + ideology1 = 2
boot4 <- boottest(lm_fit,
  B = 9999,
  clustid = c("group_id1", "group_id2"),
  param = c("treatment", "ideology1"),
  R = c(1, 1),
  r = 2
)
summary(boot1)
plot(boot1)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab