Learn R Programming

The whSample Package

whSample helps analysts quickly generate statistical samples from Excel or Comma Separated Value (CSV) files and write them to a new Excel workbook. Users have a choice of Simple Random or Stratified Random samples, and a third choice of having each stratum included in a separate worksheet.

See package vignettes for detailed documentation.

ssize

The workhorse function is sampler. A helper function, ssize, estimates the minimum sample size necessary to achieve statistical requirements using a Normal Approximation to the Hypergeometric Distribution. This distribution spans the probabilities of yes/no-type responses without replacement. These parameters are:

  • N, the population size.
  • ci, the required confidence interval. The default is 95%.
  • me, the required level of precision, or margin of error. The default is +/- 7%.
  • p, the anticipated rate of occurrence. The default is 50%.

ssize(N, ci=0.95, me=0.07, p=0.50) (showing the defaults) only requires the N argument. Used as a standalone, it can be used to explore sample sizes under other conditions. For example, a probe sample may suggest that a 50-50 probability isn’t realistic. A revised sample size can be estimated with the observed success probability (p=0.6, for example).

sampler

The sampler function calls ssize to get its sample size estimate. Therefore, it requires the ci, me, and p arguments, which it passes to ssize.

sampler also takes four additional arguments:

  • irisData opens the file chooser to a folder with example files of Anderston’s Iris dataset of flower characteristics.
  • backups provides a buffer for use if necessary to replace samples found to be invalid for some reason,
  • seed is used to seed the internal random number generator, and
  • keepOrg determines if a copy of the population is included in the output.

The defaults for these additional arguments are backups=5, irisData=F, seed=NULL and keepOrg=F. The default seed will tell sampler to use the current system time in milliseconds. Any number can be used as a seed. Whichever one is used will be listed in the Report output tab. The keep-original option (keepOrg) defaults to FALSE, but could be set to keepOrg=T for smaller populations that wouldn’t exceed Excel’s row limit is 1,048,576 rows.

To override any of these defaults, enter name=value as an argument.

sampler uses a series of menus to guide users through the sampling process.

Output

sampler creates a new Excel workbook with three parts:

  • a copy of the original (source) data if previously requested,

  • an Excel spreadsheet with the requested sample, and

  • a new tab called Report with key reference information:

    • path and name of the source file

    • size (in rows) of the source file

    • sample type (Simple Random Sample, Stratified Random Sample, or Tabbed Stratified Sample)

    • sampling parameters

    • sample size

    • stratification key

    • number of strata

    • number of backups requested (this number is applied to every stratum in a stratified sample)

    • random number seed used, for documentation and reproducibility

    • date-time stamp of when the sample was generated

    • stratification information (name, number in the population, proportion of the population, and the number of samples)

Installation

You can install whSample from CRAN with:

install.packages("whSample")

or get the latest developmental version with:

devtools::install_github("km4ivi/whSample")

Other necessary packages

sampler depends on several external packages to run properly. If you’re running a developmental version, make sure these packages are installed on your computer:

  • tidyverse (or individually: magrittr, dplyr, purrr)
  • openxlsx
  • data.table
  • tools
  • utils
  • tcltk
  • bit64

Examples

ssize(5000): N=5000, other arguments use defaults

ssize(5000, p=0.60): N=5000, with a 60% expected rate of occurrence

sampler(): Uses all defaults, gets N from the source data.

sampler(backups=2, seed=12345): Overrides specific defaults

Copy Link

Version

Install

install.packages('whSample')

Monthly Downloads

195

Version

0.9.6.2

License

GPL-3

Maintainer

Paul West

Last Published

May 13th, 2021

Functions in whSample (0.9.6.2)

ssize

Determine minimum sample size
sampler

Generate Sample Lists from Excel or CSV Files