get_confidence_interval: Compute confidence interval

Description

Compute a confidence interval around a summary statistic. Both simulation-based and theoretical methods are supported, though only type = "se" is supported for theoretical methods.

Learn more in vignette("infer").

Usage

get_confidence_interval(x, level = 0.95, type = NULL, point_estimate = NULL)
get_ci(x, level = 0.95, type = NULL, point_estimate = NULL)

Value

A tibble containing the following columns:

term: The explanatory variable (or intercept) in question. Only supplied if the input had been previously passed to fit().
lower_ci, upper_ci: The lower and upper bounds of the confidence interval, respectively.

Arguments

x: A distribution. For simulation-based inference, a data frame containing a distribution of calculate()d statistics or fit()ted coefficient estimates. This object should have been passed to generate() before being supplied or calculate() to fit(). For theory-based inference, output of assume(). Distributions for confidence intervals do not require a null hypothesis via hypothesize().
level: A numerical value between 0 and 1 giving the confidence level. Default value is 0.95.
type: A string giving which method should be used for creating the confidence interval. The default is "percentile" with "se" corresponding to (multiplier * standard error) and "bias-corrected" for bias-corrected interval as other options.
point_estimate: A data frame containing the observed statistic (in a calculate()-based workflow) or observed fit (in a fit()-based workflow). This object is likely the output of calculate() or fit() and need not to have been passed to generate(). Set to NULL by default. Must be provided if type is "se" or "bias-corrected".

Aliases

get_ci() is an alias of get_confidence_interval(). conf_int() is a deprecated alias of get_confidence_interval().

Details

A null hypothesis is not required to compute a confidence interval. However, including hypothesize() in a pipeline leading to get_confidence_interval() will not break anything. This can be useful when computing a confidence interval using the same distribution used to compute a p-value.

Theoretical confidence intervals (i.e. calculated by supplying the output of assume() to the x argument) require that the point estimate lies on the scale of the data. The distribution defined in assume() will be recentered and rescaled to align with the point estimate, as can be shown in the output of visualize() when paired with shade_confidence_interval(). Confidence intervals are implemented for the following distributions and point estimates:

distribution = "t": point_estimate should be the output of calculate() with stat = "mean" or stat = "diff in means"
distribution = "z": point_estimate should be the output of calculate() with stat = "prop" or stat = "diff in props"

Examples

Run this code


boot_dist <- gss |>
  # We're interested in the number of hours worked per week
  specify(response = hours) |>
  # Generate bootstrap samples
  generate(reps = 1000, type = "bootstrap") |>
  # Calculate mean of each bootstrap sample
  calculate(stat = "mean")

boot_dist |>
  # Calculate the confidence interval around the point estimate
  get_confidence_interval(
    # At the 95% confidence level; percentile method
    level = 0.95
  )

# for type = "se" or type = "bias-corrected" we need a point estimate
sample_mean <- gss |>
  specify(response = hours) |>
  calculate(stat = "mean")

boot_dist |>
  get_confidence_interval(
    point_estimate = sample_mean,
    # At the 95% confidence level
    level = 0.95,
    # Using the standard error method
    type = "se"
  )

# using a theoretical distribution -----------------------------------

# define a sampling distribution
sampling_dist <- gss |>
  specify(response = hours) |>
  assume("t")

# get the confidence interval---note that the
# point estimate is required here
get_confidence_interval(
  sampling_dist,
  level = .95,
  point_estimate = sample_mean
)

# using a model fitting workflow -----------------------

# fit a linear model predicting number of hours worked per
# week using respondent age and degree status.
observed_fit <- gss |>
  specify(hours ~ age + college) |>
  fit()

observed_fit

# fit 100 models to resamples of the gss dataset, where the response
# `hours` is permuted in each. note that this code is the same as
# the above except for the addition of the `generate` step.
null_fits <- gss |>
  specify(hours ~ age + college) |>
  hypothesize(null = "independence") |>
  generate(reps = 100, type = "permute") |>
  fit()

null_fits

get_confidence_interval(
  null_fits,
  point_estimate = observed_fit,
  level = .95
)

# more in-depth explanation of how to use the infer package
if (FALSE) {
vignette("infer")
}

Run the code above in your browser using DataLab