Learn R Programming

infer (version 1.0.0)

generate: Generate resamples, permutations, or simulations

Description

Generation creates a simulated distribution from specify(). In the context of confidence intervals, this is a bootstrap distribution based on the result of specify(). In the context of hypothesis testing, this is a null distribution based on the result of specify() and hypothesize().

Learn more in vignette("infer").

Usage

generate(x, reps = 1, type = NULL, variables = !!response_expr(x), ...)

Arguments

x

A data frame that can be coerced into a tibble.

reps

The number of resamples to generate.

type

The method used to generate resamples of the observed data reflecting the null hypothesis. Currently one of "bootstrap", "permute", or "draw" (see below).

variables

If type = "permute", a set of unquoted column names in the data to permute (independently of each other). Defaults to only the response variable. Note that any derived effects that depend on these columns (e.g., interaction effects) will also be affected.

...

Currently ignored.

Value

A tibble containing reps generated datasets, indicated by the replicate column.

Generation Types

The type argument determines the method used to create the null distribution.

  • bootstrap: A bootstrap sample will be drawn for each replicate, where a sample of size equal to the input sample size is drawn (with replacement) from the input sample data.

  • permute: For each replicate, each input value will be randomly reassigned (without replacement) to a new output value in the sample.

  • draw: A value will be sampled from a theoretical distribution with parameters specified in hypothesize() for each replicate. This option is currently only applicable for testing point estimates. This generation type was previously called "simulate", which has been superseded.

See Also

Other core functions: calculate(), hypothesize(), specify()

Examples

Run this code
# NOT RUN {
# generate a null distribution by taking 200 bootstrap samples
gss %>%
 specify(response = hours) %>%
 hypothesize(null = "point", mu = 40) %>%
 generate(reps = 200, type = "bootstrap")

# generate a null distribution for the independence of
# two variables by permuting their values 1000 times
gss %>%
 specify(partyid ~ age) %>%
 hypothesize(null = "independence") %>%
 generate(reps = 200, type = "permute")

# more in-depth explanation of how to use the infer package
# }
# NOT RUN {
vignette("infer")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab