SampleStratified: Draws a random, stratified sample from a vector of indices.
Description
Given a vector of logical values, this returns an index where TRUE values are kept and FALSE values are sampled.
Usage
SampleStratified(idxTrue, scale=1, verbose=TRUE)
Arguments
idxTrue
An array of logical TRUE / FALSE values. All TRUE values are kept (their index is always TRUE), and FALSE values are sampled
(their index may be TRUE or FALSE).
scale
Controls the sampling rate for FALSE values. See the Details section below for more information.
verbose
If TRUE then summary information is printed to the screen.
Value
An array of logical values indicating which records should be kept.
Details
All TRUE values from the input index are kept. The number of FALSE values that are kept is computed as follows:
$$sampleRate = \sqrt{ \frac{nFalse}{nTrue} } \frac{1}{scale}$$
$$numKeep = round( \frac{nFalse}{sampleRate} )$$
Here nFalse and nTrue are the number of FALSE and TRUE values provided in the array idxTrue.
Note that if sampleRate is less than 1 then then no sampling is performed -- all FALSE values are kept.
Values of scale greater than 1 result in more FALSE values being kept; values below 1 result in fewer.