ida.data.frame. idaLm(form, idadf, modelname = NULL, dropModel = TRUE, limit = 25)
"print"(x, ...)
"predict"(object, newdata, id, outtable = NULL, ...)
"plot"(x, names = TRUE, max_forw = 50, max_plot = 15, order = NULL,
lmgON = FALSE, backwardON = FALSE, ...)formula object that specifies both the name of the column that contains the continuous
target variable and either a list of columns separated by plus symbols or a single period (to specify that all other columns in the ida.data.frame are to be used as predictors).
The specified columns can contain continuous or categorical values.
The specified formula cannot contain transformations.idaLm.idaLmlogical: If set TRUE then the plot will contain the names of the attributes instead of numbers.integer: The maximum number of iterations the heuristic forward/backward will be calculated.integer: The maximum number of attributes that will appear in the plot. It must be bigger than 0.logical: If set TRUE the method will calculate the importance metric lmg. This method has exponential runningtime and is not supported for more than 15 attributeslogical: If set TRUE the method will calculate the backward heuristic. By default (FALSE) it will do the forward heuristic.idaLm.idaLm function computes a linear regression model by extracting a covariance matrix and
computing its inverse. This implementation is optimized for problems that involve a large number of
samples and a relatively small number of predictors. The maximum number of columns is 78.Missing values in the input table are ignored when calculating the covariance matrix. If this leads to undefined entries in the covariance matrix, the function fails. If the inverse of the covariance matrix cannot be computed (for example, due to correlated predictors), the Moore-Penrose generalized inverse is used instead. The output of the idaLm function has the following attributes:
$coefficients is a vector with two values. The first value is the slope of the line that best fits the input data; the second value is its y-intercept.
$RSS is the root sum square (that is, the square root of the sum of the squares).
$effects is not used and can be ignored.
$rank is the rank.
$df.residuals is the number of degrees of freedom associated with the residuals.
$coefftab is a is a vector with four values:
$Loglike is the log likelihood ratio.
$AIC is the Akaike information criterion. This is a measure of the relative quality of the model.
$BIC is the Bayesian information criterion. This is used for model selection.
$CovMat the Matrix used in the calculation ("Covariance Matrix"). This matrix is necessary for the Calculation in plot.idaLm and the statistics.
$card the number of dummy variables created for categorical columns and 1 for numericals.
$model the in database modelname of the idaLm object.
$numrow the number of rows of the input table that do not contain NAs.
$sigma the residual standard error.
The plot.idaLm function uses $R^2$ as a measure of quality of a linear model.
$R^2$ compares the variance of the predicted values and the variance of the actual values
of the target variable.
$First: Returns the $R^2$ value of the linear model for each attribute alone.
$Usefulness: Returns the $R^2$ value reduction of the linear model with all attributes to the linear model with one attribute taken away. $Forward_Values: Is only calculated if backwardON=FALSE. This is a heuristic that adds in each step the attribute which has the most $R^2$ increase. $LMG: Is only calculated if lmgON=TRUE. It returns the increase of $R^2$ of each attribute averaged over every possible permutation. By grouping some of the permutations we only need to average over every possible subset. For n attributes there are $2^n$ subsets. So LMG is an algorithm with exponential runningtime and is not recommended for more than 15 attributes.
$Backward_Values: Is only calculated if backwardON=TRUE. Similar to the forward heuristic. This time we choose in each step of the algorithm that has minimal $R^2$ reduction when taking it out of the model, starting with all attributes. $Model_Values: Is only calculated if order is a vector of attributes. In this case the function calculates the $R^2$ value for the models that we get when we add one attribute of order in each step. RelImpPlot.png: If lmgON=FALSE. This plot shows a stackplot of the values Usefulness,First and the Model_Value of the heuristic. Note that usually Usefulness
## Not run:
# #Create a pointer to table IRIS
# idf <- ida.data.frame("IRIS")
#
# #Calculate linear model in-db
# lm1 <- idaLm(SepalLength~., idf)
#
# library(ggplot2)
# plot(lm1)
# ## End(Not run)
Run the code above in your browser using DataLab