Learn R Programming

Causata (version 4.2-0)

GetStratifiedSample: Gets a stratified sample of data from Causata

Description

Extracts a stratified sample of data

Usage

GetStratifiedSample(connect, query, stratification.variable, stratification.variable.name, stratification.value=0)

Arguments

connect
Causata connect object - used to resample at the stratified sampling rates.
query
Causata query object - used to resample at the stratified sampling rates. Note that the Limit must be defined.
stratification.variable
A vector of values on which to base the stratification.
stratification.variable.name
The name of the Causata variable that is used as the basis of stratification.
stratification.value
Value of the stratification.variable which will determine the stratum for a record.

Value

Returns a list with two elements as follows: Returns a list with two elements as follows:

Details

This function gets a stratified sample of data from Causata. The population will be split into two strata based on whether the stratification.variable value for a record matches the stratification.value. Sampling rates for the two strata are then calculated where the rate for the larger strata, strata.A is:

sample.rate.A = sqrt((# records in strata.B) / (# records in strata.A))

New queries are run to resample the Causata data at these sample rates.

See Also

Connect, Query, Limit.

Examples

Run this code
# create some variables to query for
variables <- c('customer-id', 'total-spend')

# create a stratified sample given an initial query
# The commands below are commented out since they require an actual server connection
#connection <- Connect(hostname="server.causata.com",
#  username="user@gmail.com", password="enw8Q!mN")
#query <- Query() + Limit(500)
#df <- GetData(connection, query)

# The commands below are commented out since they require an actual server connection
#sampled.data <- GetStratifiedSample(connection, query, 
#  df[['has.purchase__Next.30.Days']], 'has.purchase__Next.30.Days', "true")
#table(sampled.data$weights)

Run the code above in your browser using DataLab