LOGISTIC_REGRESSION: Logistic regression

Description

Logistic regression analyses with SPSS- and SAS-like output. The output includes model summaries, classification tables, omnibus tests of model coefficients, the model coefficients, likelihood ratio tests for the predictors, overdispersion tests, model effect sizes, the correlation matrix for the model coefficients, collinearity statistics, and casewise regression diagnostics.

Usage

LOGISTIC_REGRESSION(data, DV, forced = NULL, hierarchical = NULL, formula=NULL,
                    ref_category = NULL,
                    family = 'binomial',
                    CI_level = 95,
                    MCMC_options = list(MCMC = FALSE, Nsamples = 10000, 
                                        thin = 1, burnin = 1000, 
                                        HDI_plot_est_type = 'standardized'),
                    plot_type = 'residuals',
                    verbose = TRUE)

Value

An object of class "LOGISTIC_REGRESSION". The object is a list containing the following possible components:

model: All of the glm function output for the regression model.
modelsum: All of the summary.glm function output for the regression model.
modeldata: All of the predictor and outcome raw data that were used in the model, along with regression diagnostic statistics for each case.
collin_diags: Collinearity diagnostic coefficients for models without interaction terms.
chain_dat: The MCMC chains.
Bayes_HDIs: The Bayesian HDIs.

Arguments

data

A dataframe where the rows are cases and the columns are the variables.

DV

The name of the dependent variable.
Example: DV = 'outcomeVar'.

forced

(optional) A vector of the names of the predictor variables for a forced/simultaneous entry regression. The variables can be numeric or factors.
Example: forced = c('VarA', 'VarB', 'VarC')

hierarchical

(optional) A list with the names of the predictor variables for each step of a hierarchical regression. The variables can be numeric or factors.
Example: hierarchical = list(step1=c('VarA', 'VarB'), step2=c('VarC', 'VarD'))

formula

(optional) Text for an R formula. Useful for testing for interactions.
Example: formula = "Aggressive_Behavior ~ Maternal_Harshness * Resiliency"")

ref_category

(optional) The reference category for DV.
Example: ref_category = 'alive'

family

(optional) The name of the error distribution to be used in the model. The options are:

"binomial" (the default), or
"quasibinomial", which should be used when there is overdispersion.

Example: family = 'quasibinomial'

CI_level

(optional) The confidence interval for the output, in whole numbers. The default is 95.

MCMC_options

(optional) A list specifying the following options for Bayesian MCMC analyses: (1) "MCMC", Should MCMC analyses be conducted? The options are TRUE or FALSE; (2) "Nsamples", for the number of iterations or samples from the posterior distribution; (3) "thin", for the chain-thinning interval; (4) "burnin", for the burnin period, i.e., the number of initial samples that should be dropped from the chains; and (5) "HDI_plot_est_type", for the kind of regression estimates that will appear in any requested HDI plots. The options are "standardized" or "raw".
Example: MCMC_options = list(MCMC = TRUE, Nsamples = 10000, thin = 1, burnin = 1000, HDI_plot_est_type = 'standardized')

plot_type

(optional) The kind of plots, if any. The options are:

'residuals' (the default),
'diagnostics', for regression diagnostics,
'Bayes_HDI' (for MCMC posterior distributions), and
'none', for no plots.

Example: plot_type = 'diagnostics'

verbose

(optional) Should detailed results be displayed in console?
The options are: TRUE (default) or FALSE. If TRUE, plots of residuals are also produced.

Author

Brian P. O'Connor

Details

This function uses the glm function from the stats package and supplements the output with additional statistics and in formats that resembles SPSS and SAS output. The predictor variables can be numeric or factors.

The function assigns contrasts (dummy codes) to factor variables that do not already have contrasts. The baseline group for the dummy codes is determined by the alphabetic/numeric order of the factor levels. If the terms "control" or "Control" or "baseline" or "Baseline" appear in the names of a factor level, then that factor level is used as the dummy codes baseline.

Predicted values for this model, for selected levels of the predictor variables, can be produced and plotted using the PLOT_MODEL funtion in this package.

The Bayesian MCMC analyses can be time-consuming for larger datasets. The MCMC analyses are conducted using functions, and their default settings, from the rstanarm package (Goodrich, Gabry, Ali, & Brilleman, 2024). The MCMC results can be verified using the model checking functions in the rstanarm package (e.g., Muth, Oravecz, & Gabry, 201).

Good sources for interpreting logistic regression residuals and diagnostics plots:

References

Dunn, P. K., & Smyth, G. K. (2018). Generalized linear models with examples in R. Springer.

Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. Los Angeles, CA: Sage.

Goodrich, B., Gabry, J., Ali, I., & Brilleman, S. (2024). rstanarm: Bayesian applied regression modeling via Stan. R package version 2.32.1, https://mc-stan.org/rstanarm/.

Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2014). Multivariate data analysis, (8th ed.). Lawrence Erlbaum Associates.

Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013) Applied logistic regression. (3rd ed.). John Wiley & Sons.

Muth, C., Oravecz, Z., & Gabry, J. (2018). User-friendly Bayesian regression modeling: A tutorial with rstanarm and shinystan. The Quantitative Methods for Psychology, 14(2), 99119.
https://doi.org/10.20982/tqmp.14.2.p099

Orme, J. G., & Combs-Orme, T. (2009). Multiple regression with discrete dependent variables. Oxford University Press.

Pituch, K. A., & Stevens, J. P. (2016). Applied multivariate statistics for the social sciences: Analyses with SAS and IBM's SPSS, (6th ed.). Routledge.

Rindskopf, D. (2023). Generalized linear models. In H. Cooper, M. N. Coutanche, L. M. McMullen, A. T. Panter, D. Rindskopf, & K. J. Sher (Eds.), APA handbook of research methods in psychology: Data analysis and research publication, (2nd ed., pp. 201-218). American Psychological Association.

Examples

Run this code

# Meyers, 2013, p. 263: forced (simultaneous) entry
LOGISTIC_REGRESSION(data = data_Meyers_2013, DV='graduated', 
                    forced=c('sex','family_encouragement'),
                    plot_type = 'diagnostics')
# \donttest{	
# for Kremelburg, 2011, p. 244: hierarchical entry, with Bayesian MCMC analyses & HDI plots
LOGISTIC_REGRESSION(data = data_Kremelburg_2011, DV='OCCTRAIN',
                    hierarchical=list( step1=c('AGE', 'female'), 
                                       step2=c('EDUC','REALRINC')),
                    MCMC_options = list(MCMC = TRUE, Nsamples = 10000, 
                                        thin = 1, burnin = 1000, 
                                        HDI_plot_est_type = 'raw'),
                    plot_type = 'Bayes_HDI')
# }

Run the code above in your browser using DataLab