statsMS: Obtain performance statistics of a series of linear models

Description

This function returns several statistics measuring the performance of a series of linear models built using the function buildMS, with an option to rank the models based on one of the returned performance statistics.

Usage

statsMS(model, design.info, arrange.by, digits)

Arguments

model

A list of linear models returned by buildMS.

design.info

Extra information about the linear models in the series.

arrange.by

Character string defining if the table with the performance statistics of the linear models should be arranged, and which column should be used. Available options are "candidates", "df", "aic", "rmse", <

digits

Integer or vector with six integers indicating the number of decimal places to be used to round the performance statistics. If a vector is passed to the function, the number of decimal places should be in the following order:

c("aic", "rmse", "nrms

Value

A data frame with several performance statistics: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

TODO

Include other performance statistics such as: PRESS, BIC, Mallow's Cp, max(VIF);
Add option to select which performance statistics should be returned.

Details

This function was devised to deal with a list of linear models generated by the function buildMS. The main objective is to compare several linear models using several performance statistics. Such statistics can then be used to rank the linear models and identify, for example, the best performing model, given the selected performance statistics.

An important feature of statsMS is that it uses the information about the initial number of candidate predictor variables offered to the build the model to calculate penalized or adjusted measures of model performance. Such information is recorded as an attribute of the final model selected by buildMS. This feature was included in statsMS because data-driven variable selection results biased linear models (too optimistic), and the effective number of degrees of freedom is close to the number of candidate predictor variables initially offered to the model (Harrell, 2001).

References

Harrell, F. E. (2001) Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. First edition. New York: Springer.

Venables, W. N. and Ripley, B. D. (2002) Modern applied statistics with S. Fourth edition. New York: Springer.

Examples

Run this code

# based on the second example of function stepAIC
require(MASS)
cpus1 <- cpus
for(v in names(cpus)[2:7])
  cpus1[[v]] <- cut(cpus[[v]], unique(quantile(cpus[[v]])),
                    include.lowest = TRUE)
cpus0 <- cpus1[, 2:8]  # excludes names, authors' predictions
cpus.samp <- sample(1:209, 100)
cpus.form <- list(formula(log10(perf) ~ syct + mmin + mmax + cach + chmin +
                  chmax + perf),
                  formula(log10(perf) ~ syct + mmin + cach + chmin + chmax),
                  formula(log10(perf) ~ mmax + cach + chmin + chmax + perf))
data <- cpus1[cpus.samp,2:8]
cpus.ms <- buildMS(cpus.form, data, vif = TRUE, aic = TRUE)
cpus.des <- data.frame(a = c(0, 1, 0), b = c(1, 0, 1), c = c(1, 1, 0))
stats <- statsMS(cpus.ms, design.info = cpus.des, arrange.by = "aic")

Run the code above in your browser using DataLab