OVERVIEW
The purpose of Logit
is to combine the following function calls into one, as well as provide ancillary analyses such as as graphics, organizing output into tables and sorting to assist interpretation of the output. The basic analysis successively invokes several standard R functions beginning with the standard R function for estimation of the logit model, glm
with family="binomial"
. The output of the analysis is stored in the object lm.out
, available for further analysis in the R environment upon completion of the Logit
function. By default reg
automatically provides the analyses from the standard R functions, summary
, confint
and anova
, with some of the standard output modified and enhanced. The correlation matrix of the model variables is obtained with cor
function. The residual analysis invokes fitted
, resid
, rstudent
, and cooks.distance
functions. The option for prediction intervals calls the standard generic R function predict
. The lessR
den
function provides the histogram and density plots for the residuals and the ScatterPlot
function provides the scatter plots of the residuals with the fitted values and of the data for the one-predictor model.
The default analysis provides the model's parameter estimates and corresponding hypothesis tests and confidence intervals, goodness of fit indices, the ANOVA table, analysis of residuals and influence as well as the fitted value and standard error for each observation in the model. The response variable must be binary with only numeric values of 0 and 1. See the examples of how obtain exclusive 0 and 1 values from character data.
DATA FRAME
The name mydata
is by default provided by the Read
function included in this package for reading and displaying information about the data in preparation for analysis. If all the variables in the model are not in the same data frame, the analysis will not be complete. The data frame does not need to be attached, just specified by name with the dframe
option if the name is not the default mydata
.
GRAPHICS
Two or three default graphs are provided. By default the graphs are written to separate graphics windows (which may overlap each other completely, in which case move the top graphics windows). Or, the graphics.save
option may be invoked to save the graphs to a single pdf file called regOut.pdf
. The directory to which the file is written is displayed on the console text output.
1. A histogram of the residuals includes the superimposed normal and general density plots from the den
function included in this lessR
package. The overlapping density plots, which both overlap the histogram, are filled with semi-transparent colors to enhance readability.
2. A scatterplot of the residuals with the fitted values is also provided from the ScatterPlot
function included in this package. The point corresponding to the largest value of Cook's distance, regardless of its size, is plotted in red and labeled and the corresponding value of Cook's distance specified in the subtitle of the plot. Also by default all points with a Cook's distance value larger than 1.0 are plotted in red, a value that can be specified to any arbitrary value with the cooks.cut
option. This scatterplot also includes the lowess
curve.
3. For models with a single predictor variable, a scatterplot of the data is produced, which also includes the fitted values. As with the density histogram plot of the residuals and the scatterplot of the fitted values and residuals, the scatterplot includes a colored background with grid lines.
RESIDUAL ANALYSIS
By default the residual analysis lists the data and fitted value for each observation as well as the residual, Studentized residual, Cook's distance and dffits, with the first 20 observations listed and sorted by Cook's distance. The residual displayed is the actual difference between fitted and observed, that is, with the setting in the residuals
of type="response"
. The res.sort
option provides for sorting by the Studentized residuals or not sorting at all. The res.rows
option provides for listing these rows of data and computed statistics statistics for any specified number of observations (rows). To turn off the analysis of residuals, specify res.rows=0
.
INVOKED R OPTIONS
The options
function is called to turn off the stars for different significance levels (show.signif.stars=FALSE), to turn off scientific notation for the output (scipen=30), and to set the width of the text output at the console to 120 characters. The later option can be re-specified with the text.width
option. After reg
is finished with a normal termination, the options are re-set to their values before the reg
function began executing.
COLOR THEME
A color theme for all the colors can be chosen for a specific plot with the colors
option. Or, the color theme can be changed for all subsequent graphical analysis with the lessR
function set
. The default color theme is blue
, but a gray scale is available with "gray"
, and other themes are available as explained in set
.
VARIABLE LABELS
Although standard R does not provide for variable labels, lessR
can store the labels in a data frame called mylabels
, obtained from the Read
function. If this labels data frame exists, then the corresponding variable label is by default listed as the label for the horizontal axis and on the text output. For more information, see Read
.