idaLm: Linear regression

Description

This function performs a linear regression on the contents of a IDA data frame (ida.data.frame).

Usage

idaLm(form, idadf,limit=25)

## S3 method for class 'idaLm':
print(x)

Arguments

form

A formula object that specifies both the name of the column that contains the continuous target variable and either a list of columns separated by plus symbols or a single period (to specify that all other columns in the IDA data frame ar

idadf

A IDA data frame that contains the input data for the function.

limit

The maximum number of distinct values per categorical column. The default is 25.

An object of the class idaLm.

Value

The procedure returns a linear regression model in an object of class idaLm.

Details

The idaLm function computes a linear regression model by extracting a covariance matrix and computing its inverse. This implementation is optimized for problems that involve a large number of samples and a relatively small number of predictors. The maximum number of columns is 87. Missing values in the input table are ignored when calculating the covariance matrix. If this leads to undefined entries in the covariance matrix, the function fails. If the inverse of the covariance matrix cannot be computed (for example, due to correlated predictors), the Moore-Penrose generalized inverse is used instead. The output of the idaLm function has the following attributes: $coefficients is a vector with two values. The first value is the slope of the line that best fits the input data; the second value is its y-intercept. $RSS is the root sum square (that is, the square root of the sum of the squares). $effects is not used and can be ignored. $rank is the rank. $df.residuals is the number of degrees of freedom associated with the residuals. $coefftab is a is a vector with four values: - The slope and y-intercept of the line that best fits the input data - The standard error - The t-value - The p-value $Loglike is the log likelihood ratio. $AIC is the Akaike information criterion. This is a measure of the relative quality of the model. $BIC is the Bayesian information criterion. This is used for model selection.

Examples

Run this code

#Create a pointer to table DB2INST1.SHOWCASE_SYSUSAGE
sysusage<-ida.data.frame('DB2INST1.SHOWCASE_SYSUSAGE')

#Calculate linear model in-db
lm1 <- idaLm(MEMUSED~USERS, sysusage)

Run the code above in your browser using DataLab