Learn R Programming

Causata (version 4.2-0)

SampleStratified: Draws a random, stratified sample from a vector of indices.

Description

Given a vector of logical values, this returns an index where TRUE values are kept and FALSE values are sampled.

Usage

SampleStratified(idxTrue, scale=1, verbose=TRUE)

Arguments

idxTrue
An array of logical TRUE / FALSE values. All TRUE values are kept (their index is always TRUE), and FALSE values are sampled (their index may be TRUE or FALSE).
scale
Controls the sampling rate for FALSE values. See the Details section below for more information.
verbose
If TRUE then summary information is printed to the screen.

Value

  • An array of logical values indicating which records should be kept.

Details

All TRUE values from the input index are kept. The number of FALSE values that are kept is computed as follows: $$sampleRate = \sqrt{ \frac{nFalse}{nTrue} } \frac{1}{scale}$$ $$numKeep = round( \frac{nFalse}{sampleRate} )$$ Here nFalse and nTrue are the number of FALSE and TRUE values provided in the array idxTrue. Note that if sampleRate is less than 1 then then no sampling is performed -- all FALSE values are kept. Values of scale greater than 1 result in more FALSE values being kept; values below 1 result in fewer.

Examples

Run this code
data(df.causata)
idx <- SampleStratified(df.causata$has.responded.mobile.logoff_next.hour_466=="true")
table(df.causata$has.responded.mobile.logoff_next.hour_466, idx)

Run the code above in your browser using DataLab