glsm: Saturated Model Log-Likelihood for Multinomial Outcomes

Description

When the response variable Y takes one of R > 1 values, the function "glsm()" computes the maximum likelihood estimates (MLEs) of the parameters under four models: null, complete, saturated, and logistic. It also calculates the log-likelihood values for each model.

The method assumes independent, non-identically distributed variables. For grouped data with a multinomial outcome variable, where the observations are divided into J populations, the function '"glsm()" offers reliable estimation for any number K of explanatory variables.

Usage

glsm(formula, data, ref = NaN)

Value

An object of class "glsm", which is a list containing at least the following components:

coefficients: Vector of estimated coefficients, including intercepts and slopes.
coef: Alias for coefficients. Returns the same vector of estimated intercepts and slopes.
Std.Error: Vector of standard errors for the estimated coefficients (intercepts and slopes).
ExpB: Vector containing the exponentiated coefficients (i.e., exp(beta)) for interpretation as odds ratios.
Wald: Wald test statistic used to assess the significance of each coefficient (assumed to follow a chi-squared distribution).
DF: Degrees of freedom associated with the Wald test's chi-squared distribution.
P.value: P-values corresponding to the Wald test statistics.
Log_Lik_Complete: Log-likelihood value of the complete model.
Log_Lik_Null: Log-likelihood value of the null model.
Log_Lik_Logit: Log-likelihood value of the logistic model.
Log_Lik_Saturate: Log-likelihood value of the saturated model.
Populations: Number of populations considered in the saturated model.
Dev_Null_vs_Logit: Deviance statistic comparing the null and logistic models.
Dev_Logit_vs_Complete: Deviance statistic comparing the logistic and complete models.
Dev_Logit_vs_Saturate: Deviance statistic comparing the logistic and saturated models.
Df_Null_vs_Logit: Degrees of freedom for the deviance test comparing the null and logistic models.
Df_Logit_vs_Complete: Degrees of freedom for the deviance test comparing the logistic and complete models.
Df_Logit_vs_Saturate: Degrees of freedom for the deviance test comparing the logistic and saturated models.
P.v_Null_vs_Logit: P-value for the hypothesis test comparing the null and logistic models.
P.v_Logit_vs_Complete: P-value for the hypothesis test comparing the logistic and complete models.
P.v_Logit_vs_Saturate: P-value for the hypothesis test comparing the logistic and saturated models.
Logit_r: Matrix of log-odds values, with respect to the reference category r of the outcome variable Y.
p_hat_complete: Vector of probabilities that the outcome variable takes the value 1, given the jth population (estimated from the complete model, excluding the logistic model).
p_hat_null: Vector of probabilities that the outcome variable takes the value 1, given the jth population (estimated from the null model, excluding the logistic model).
p_rj: Matrix containing the estimated values of each prj, the probability that the outcome variable takes the value r, given the jth population (estimated using the logistic model).
odd: Vector containing the odds for each jth population.
OR: Vector containing the odds ratios for each variable's coefficient.
z_rj: Vector containing the values of each Zrj, defined as the sum of observations in the jth population.
nj: Vector containing the number of observations (nj) in each jth population.
p_rj_tilde: Vector containing the estimated values of each prj, the probability that the outcome variable takes the value r, given the jth population (estimated under the saturated model, without estimating logistic parameters).
v_rj: Vector of variances of the Bernoulli variables in the jth population and category r.
m_rj: Vector of expected values of Zj in the jth population and category r.
V_rj: Vector of variances of Zj in the jth population and category r.
V: Variance–covariance matrix of Z, the vector containing all Zj values.
S_p: Score vector computed under the saturated model.
I_p: Fisher information matrix under the saturated model.
Zast_j: Vector of standardized values for the variable Zj.
mcov: Variance–covariance matrix of the coefficient estimates.
mcor: Correlation matrix of the coefficient estimates.
Esm: Estimated Saturated Matrix. A data frame containing estimates from the saturated model. For each population j, it includes the values of the explanatory variables, nj, Zrj, prj_tilde, and the log-likelihood Lp_tilde.
Elm: Estimated Logit Matrix. A data frame containing estimates from the logistic model. For each population j, it includes the values of the explanatory variables, nj, Zrj, prj, the logit transformation Logit_rj, and the variance of the logit (var_logit_rj).
call: The original function call used to fit the glsm model.

Arguments

formula: An object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. See 'Details' for more information on model specification.
data: An optional data frame, list, or environment (or object coercible via as.data.frame) containing the variables in the model. If variables are not found in data, they are taken from environment(formula), typically the environment from which glsm() is called.
ref: Optional character string indicating the reference level of the response variable. If not specified, the first level is used by default.

Author

Humberto Llinás (Universidad del Norte, Barranquilla-Colombia; author), Jorge Villalba (Universidad Tecnológica de Bolívar, Cartagena-Colombia; author and creator), Jorge Borja (Universidad del Norte, Barranquilla-Colombia; author and creator), Jorge Tilano (Universidad del Norte, Barranquilla-Colombia; author)

Details

glsm.R

An expression of the form y ~ model is interpreted as a specification that the response variable y is modeled by a linear predictor, symbolically defined by model (the systematic component). The model consists of terms separated by + operators. Each term can include variable or factor names, and interactions between variables are denoted by :. Such a term represents the interaction of all included variables and factors. In this context, y is the outcome variable, which may be binary or polychotomous.

References

Hosmer, D., Lemeshow, S., & Sturdivant, R. (2013). Applied Logistic Regression (3rd ed.). New York: Wiley. ISBN: 978-0-470-58247-3 Llinás, H. (2006). Precisiones en la teoría de los modelos logísticos. Revista Colombiana de Estadística, 29(2), 239–265. Llinás, H., & Carreño, C. (2012). The Multinomial Logistic Model for the Case in Which the Response Variable Can Assume One of Three Levels and Related Models. Revista Colombiana de Estadística, 35(1), 131–138. Orozco, E., Llinás, H., & Fonseca, J. (2020). Convergence theorems in multinomial saturated and logistic models. Revista Colombiana de Estadística, 43(2), 211–231. Llinás, H., Arteta, M., & Tilano, J. (2016). El modelo de regresión logística para el caso en que la variable de respuesta puede asumir uno de tres niveles: estimaciones, pruebas de hipótesis y selección de modelos. Revista de Matemática: Teoría y Aplicaciones, 23(1), 173–197.

Examples

Run this code

library(glsm)
data("hsbdemo", package = "glsm")
model <- glsm(prog ~ ses + gender, data = hsbdemo, ref = "academic")
model

Run the code above in your browser using DataLab