dplyr (version 0.7.8)

sample: Sample n rows from a table

Description

This is a wrapper around sample.int() to make it easy to select random rows from a table. It currently only works for local tbls.

Usage

sample_n(tbl, size, replace = FALSE, weight = NULL, .env = NULL)

sample_frac(tbl, size = 1, replace = FALSE, weight = NULL, .env = NULL)

Arguments

tbl

tbl of data.

size

For sample_n(), the number of rows to select. For sample_frac(), the fraction of rows to select. If tbl is grouped, size applies to each group.

replace

Sample with or without replacement?

weight

Sampling weights. This must evaluate to a vector of non-negative numbers the same length as the input. Weights are automatically standardised to sum to 1.

This argument is automatically quoted and later evaluated in the context of the data frame. It supports unquoting. See vignette("programming") for an introduction to these concepts.

.env

This variable is deprecated and no longer has any effect. To evaluate weight in a particular context, you can now unquote a quosure.

Examples

Run this code
# NOT RUN {
by_cyl <- mtcars %>% group_by(cyl)

# Sample fixed number per group
sample_n(mtcars, 10)
sample_n(mtcars, 50, replace = TRUE)
sample_n(mtcars, 10, weight = mpg)

sample_n(by_cyl, 3)
sample_n(by_cyl, 10, replace = TRUE)
sample_n(by_cyl, 3, weight = mpg / mean(mpg))

# Sample fixed fraction per group
# Default is to sample all data = randomly resample rows
sample_frac(mtcars)

sample_frac(mtcars, 0.1)
sample_frac(mtcars, 1.5, replace = TRUE)
sample_frac(mtcars, 0.1, weight = 1 / mpg)

sample_frac(by_cyl, 0.2)
sample_frac(by_cyl, 1, replace = TRUE)
# }

Run the code above in your browser using DataCamp Workspace