Learn R Programming

toaster (version 0.5.5)

computeClusterSample: Random sample of clustered data

Description

Random sample of clustered data

Usage

computeClusterSample(channel, km, sampleFraction, sampleSize, scaled = FALSE, includeId = TRUE, test = FALSE)

Arguments

channel
connection object as returned by odbcConnect.
km
an object of class "toakmeans" obtained with computeKmeans.
sampleFraction
vector with one or more sample fractions to use in the sampling of data. Multiple fractions define sampling for each cluster in kmeans km object where vector length must be equal to the number of clusters.
sampleSize
vector with sample size (applies only when sampleFraction is missing). Multiple sizes define sampling for each cluster in kmeans km object where vector length must be equal to the number of clusters.
scaled
logical: indicates if original (default) or scaled data returned.
includeId
logical indicates if sample should include key attribute identifying each data point.
test
logical: if TRUE show what would be done, only (similar to parameter test in RODBC functions: sqlQuery and sqlSave).

Value

computeClusterSample returns an object of class "toakmeans" (compatible with class "kmeans").

See Also

computeKmeans

Examples

Run this code
if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")
                         
km = computeKmeans(conn, "batting", centers=5, iterMax = 25,
                   aggregates = c("COUNT(*) cnt", "AVG(g) avg_g", "AVG(r) avg_r", "AVG(h) avg_h"),
                   id="playerid || '-' || stint || '-' || teamid || '-' || yearid", 
                   include=c('g','r','h'), scaledTableName='kmeans_test_scaled', 
                   centroidTableName='kmeans_test_centroids',
                   where="yearid > 2000")
km = computeClusterSample(conn, km, 0.01)
km
createClusterPairsPlot(km, title="Batters Clustered by G, H, R", ticks=FALSE)

# per cluster sample fractions
km = computeClusterSample(conn, km, c(0.01, 0.02, 0.03, 0.02, 0.01))
}

Run the code above in your browser using DataLab