Learn R Programming

toaster (version 0.5.5)

computeLm: Fit Linear Model and return its coefficients.

Description

Outputs coefficients of the linear model fitted to Aster table according to the formula expression containing column names. The zeroth coefficient corresponds to the slope intercept. R formula expression with column names for response and predictor variables is exactly as in lm function (though less features supported).

Usage

computeLm(channel, tableName, formula, tableInfo = NULL, categories = NULL, sampleSize = 1000, where = NULL, test = FALSE)

Arguments

channel
connection object as returned by odbcConnect
tableName
Aster table name
formula
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under `Details`.
tableInfo
pre-built table summary with data types
categories
vector with column names containing categorical data. Optional if the column is of character type as it is automatically treated as categorical predictors. But if numerical column contains categorical data then then it has to be specified for a model to view it as categorical. Apply extra care not to have columns with too many values (approximaltely > 10) as categorical because each value results in dummy predictor variable added to the model.
sampleSize
function always computes regression model coefficent on all data in the table. But it computes predictions and returns an object of class "lm" based on sample of data. The sample size is in an absolute value for number of rows in the sample. Be careful not overestimating the size as all results are loaded into memory. Special value "all" or "ALL" will include all data in computation.
where
specifies criteria to satisfy by the table rows before applying computation. The creteria are expressed in the form of SQL predicates (inside WHERE clause).
test
logical: if TRUE show what would be done, only (similar to parameter test in RODBC functions like sqlQuery and sqlSave).

Value

computeLm returns an object of class "toalm", "lm".The function summary .....For backward compatibility Outputs data frame containing 3 columns:

Details

Models for computeLm are specified symbolically. A typical model has the form response ~ terms where response is the (numeric) column and terms is a series of column terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed. A specification of the form first:second and first*second (interactions) are not supported yet.

Examples

Run this code
if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")

# batting average explained by rbi, bb, so 
lm1 = computeLm(channel=conn, tableName="batting_enh", formula= ba ~ rbi + bb + so)
summary(lm1)

# with category predictor league and explicit sample size
lm2 = computeLm(channel=conn, tableName="batting_enh", formula= ba ~ rbi + bb + so + lgid,
                , sampleSize=10000, where="lgid in ('AL','NL') and ab > 30") 
summary(lm2)
}

Run the code above in your browser using DataLab