Learn R Programming

toaster (version 0.4.1)

computeClusterSample: Random sample of clustered data

Description

Random sample of clustered data

Usage

computeClusterSample(channel, km, sampleFraction, sampleSize, scaled = FALSE,
  includeId = FALSE, test = FALSE)

Arguments

channel
connection object as returned by odbcConnect.
km
an object of class "toakmeans" obtained with computeKmeans.
sampleFraction
one or more sample fractions to use in the sampling of data. (multipe sampling fractions are not yet supported.)
sampleSize
total sample size (applies only when sampleFraction is missing).
scaled
logical: indicates if original (default) or scaled data returned.
includeId
logical indicates if sample should include the key uniquely identifying each data row.
test
logical: if TRUE show what would be done, only (similar to parameter test in RODBC functions: sqlQuery and sqlSave).

Value

  • computeClusterSample returns an object of class "toakmeans" (compatible with class "kmeans").

See Also

computeKmeans

Examples

Run this code
if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")
                         
km = computeKmeans(conn, "batting", centers=5, iterMax = 25,
                   aggregates = c("COUNT(*) cnt", "AVG(g) avg_g", "AVG(r) avg_r", "AVG(h) avg_h"),
                   id="playerid || '-' || stint || '-' || teamid || '-' || yearid", 
                   include=c('g','r','h'), scaledTableName='kmeans_test_scaled', 
                   centroidTableName='kmeans_test_centroids',
                   where="yearid > 2000")
km = computeClusterSample(conn, km, 0.01)
km
createClusterPairsPlot(km, title="Batters Clustered by G, H, R", ticks=FALSE)
}

Run the code above in your browser using DataLab