rep_sample_n: Perform repeated sampling

Description

These functions extend the functionality of dplyr::sample_n() and dplyr::slice_sample() by allowing for repeated sampling of data. This operation is especially helpful while creating sampling distributions<U+2014>see the examples below!

Usage

rep_sample_n(tbl, size, replace = FALSE, reps = 1, prob = NULL)
rep_slice_sample(.data, n = 1, replace = FALSE, weight_by = NULL, reps = 1)

Arguments

tbl, .data

Data frame of population from which to sample.

size, n

Sample size of each sample.

replace

Should sampling be with replacement?

reps

Number of samples of size n = size to take.

prob, weight_by

A vector of sampling weights for each of the rows in tbl<U+2014>must have length equal to nrow(tbl).

Value

A tibble of size rep * size rows corresponding to reps samples of size size from tbl, grouped by replicate.

Details

The dplyr::sample_n() function (to which rep_sample_n() was originally a supplement) has been superseded by dplyr::slice_sample(). rep_slice_sample() provides a light wrapper around rep_sample_n() that has a more similar interface to slice_sample().

Examples

Run this code

# NOT RUN {
library(dplyr)
library(ggplot2)

# take 1000 samples of size n = 50, without replacement
slices <- gss %>%
  rep_sample_n(size = 50, reps = 1000)

slices

# compute the proportion of respondents with a college
# degree in each replicate
p_hats <- slices %>%
  group_by(replicate) %>%
  summarize(prop_college = mean(college == "degree"))

# plot sampling distribution
ggplot(p_hats, aes(x = prop_college)) +
  geom_density() +
  labs(
    x = "p_hat", y = "Number of samples",
    title = "Sampling distribution of p_hat"
  )
  
# sampling with probability weights. Note probabilities are automatically 
# renormalized to sum to 1
library(tibble)
df <- tibble(
  id = 1:5,
  letter = factor(c("a", "b", "c", "d", "e"))
)
rep_sample_n(df, size = 2, reps = 5, prob = c(.5, .4, .3, .2, .1))
# }

Run the code above in your browser using DataLab