Learn R Programming

toaster (version 0.3.1)

computeSample: Randomly sample data from the table.

Description

Draws a sample of rows from the table randomly. The function offers two sampling schemes: - a simple binomial (Bernoulli) sampling on a row-by-row basis with given sample rate(s) - sampling a given number of rows without replacement The sampling can be applied to the entire table or can be refined with conditions.

Usage

computeSample(channel, tableName, sampleFraction, sampleSize, include = NULL,
  except = NULL, where = NULL, as.is = FALSE, stringsAsFactors = FALSE,
  test = FALSE)

Arguments

channel
connection object as returned by odbcConnect
tableName
table name
sampleFraction
one or more sample fractions to use in the sampling of data. (multipe sampling fractions are not yet supported.)
sampleSize
total sample size (applies only when sampleFraction is missing).
include
a vector of column names to include. Output never contains attributes other than in the list.
except
a vector of column names to exclude. Output never contains attributes from the list.
where
specifies criteria to satisfy by the table rows before applying computation. The creteria are expressed in the form of SQL predicates (inside WHERE clause).
as.is
which (if any) columns returned as character should be converted to another type? Allowed values are as for read.table. See also sqlQuery.
stringsAsFactors
logical: should columns returned as character and not excluded by as.is and not converted to anything else be converted to factors?
test
logical: if TRUE show what would be done, only (similar to parameter test in RODBC functions like sqlQuery and sqlSave).

Examples

Run this code
if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")

batters = computeSample(conn, "batting", sampleFraction=0.01)
dim(batters)

pitchersAL = computeSample(conn, "pitching", sampleSize=1000,
                           where="lgid = 'AL'")
dim(ptichersAL)
}

Run the code above in your browser using DataLab