lsm: Estimation of the log Likelihood of the Saturated Model

Description

When the values of the outcome variable Y are either 0 or 1, the function lsm() calculates the estimation of the log likelihood in the saturated model. This model is characterized by Llinas (2006, ISSN:2389-8976) in section 2.3 through the assumptions 1 and 2. If Y is dichotomous and the data are grouped in J populations, it is recommended to use the function lsm() because it works very well for all K.

Usage

lsm(formula, family = binomial, data = environment(formula), ...)

Value

lsm returns an object of class "lsm".

An object of class "lsm" is a list containing at least the following components:

coefficients: Vector of coefficients estimations (intercept and slopes).
coef: Vector of coefficients estimations (intercept and slopes).
Std.Error: Vector of the coefficients’s standard error (intercept and slopes).
ExpB: Vector with the exponential of the coefficients (intercept and slopes).
Wald: Value of the Wald statistic (with chi-squared distribution).
DF: Degree of freedom for the Chi-squared distribution.
P.value: P-value calculated with the Chi-squared distribution.
Log_Lik_Complete: Estimation of the log likelihood in the complete model.
Log_Lik_Null: Estimation of the log likelihood in the null model.
Log_Lik_Logit: Estimation of the log likelihood in the logistic model.
Log_Lik_Saturate: Estimation of the log likelihood in the saturate model.
Populations: Number of populations in the saturated model.
Dev_Null_vs_Logit: Value of the test statistic (Hypothesis: null vs logistic models).
Dev_Logit_vs_Complete: Value of the test statistic (Hypothesis: logistic vs complete models).
Dev_Logit_vs_Saturate: Value of the test statistic (Hypothesis: logistic vs saturated models).
Df_Null_vs_Logit: Degree of freedom for the test statistic’s distribution (Hypothesis: null vs logistic models).
Df_Logit_vs_Complete: Degree of freedom for the test statistic’s distribution (Hypothesis: logistic vs saturated models).
Df_Logit_vs_Saturate: Degree of freedom for the test statistic’s distribution (Hypothesis: logistic vs saturated models).
P.v_Null_vs_Logit: P-value for the hypothesis test: null vs logistic models.
P.v_Logit_vs_Complete: P-value for the hypothesis test: logistic vs complete models.
P.v_Logit_vs_Saturate: P-value for the hypothesis test: logistic vs saturated models.
Logit: Vector with the log-odds.
p_hat_complete: Vector with the probabilities that the outcome variable takes the value 1, given the jth population (estimated with the complete model and without the logistic model).
p_hat_null: Vector with the probabilities that the outcome variable takes the value 1, given the jth population (estimated with the null model and without the logistic model).
p_j: Vector with the probabilities that the outcome variable takes the value 1, given the jth population (estimated with the logistic model).
odd: Vector with the values of the odd in each jth population.
OR: Vector with the values of the odd ratio for each coefficient of the variables.
z_j: Vector with the values of each Zj (the sum of the observations in the jth population).
n_j: Vector with the nj (the number of the observations in each jth population).
p_j_tilde: Vector with the estimation of each pj (the probability of success in the jth population) in the saturated model (without estimate the logistic parameters).
v_j: Vector with the variance of the Bernoulli variables in the jth population.
m_j: Vector with the expected values of Zj in the jth population.
V_j: Vector with the variances of Zj in the jth population.
V: Variance and covariance matrix of Z, the vector that contains all the Zj.
S_p: Score vector in the saturated model.
I_p: Information matrix in the saturated model.
Zast_j: Vector with the values of the standardized variable of Zj.
mcov: Variance and covariance matrix for coefficient estimates.
mcor: Correlation matrix for coefficient estimates.
Esm: Data frame with estimates in the saturated model. It contains for each population j: the value of the explanatory variables, nj, Zj, pj and Log-Likelihood Lj_tilde.
Elm: Data frame with estimates in the logistic model. It contains for each population j: the value of the explanatory variables, nj, Zj, pj, Log-Likelihood Lj, Logit_pj and the variance of logit (var.logit).
call: It displays the original call that was used to fit the model lsm.
data: data envarironment.
...: Additional arguments to be passed to methods.

Arguments

formula: An expression of the form y ~ model, where y is the outcome variable (binary or dichotomous: its values are 0 or 1).
family: an optional funtion for example binomial.
data: an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which lsm() is called.
...: further arguments passed to or from other methods.

Author

Dr. rer. nat. Humberto LLinás Solano [aut] (Universidad del Norte, Barranquilla-Colombia); MSc. Omar Fábregas Cera [aut] (Universidad del Norte, Barranquilla-Colombia); MSc. Jorge Villalba Acevedo [cre, aut] (Universidad Tecnológica de Bolívar, Cartagena-Colombia).

Details

Estimation of the log Likelihood of the Saturated Model

An expression of the form y ~ model is interpreted as a specification that the response y is modelled by a linear predictor specified symbolically by model (systematic component). Such a model consists of a series of terms separated by + operators. The terms themselves consist of variable and factor names separated by : operators. Such a term is interpreted as the interaction of all the variables and factors appearing in the term. Here, y is the outcome variable (binary or dichotomous: its values are 0 or 1).

References

[1] LLinás, H. J. (2006). Precisiones en la teoría de los modelos logísticos. Revista Colombiana de Estadística, 29(2), 239–265. https://revistas.unal.edu.co/index.php/estad/article/view/29310

[2] Hosmer, D.W., Lemeshow, S. and Sturdivant, R.X. (2013). Applied Logistic Regression, 3rd ed., New York: Wiley.

[3] Chambers, J. M. and Hastie, T. J. (1992). Statistical Models in S. Wadsworth & Brooks/Cole.

Examples

Run this code

#library(lsm)

#1. AGE and Coronary Heart Disease (CHD) Status of 20 subjects:

   #AGE <- c(20,23,24,25,25,26,26,28,28,29,30,30,30,30,30,30,30,32,33,33)
   #CHD <- c(0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0)
   #data <- data.frame (CHD,  AGE )
   #lsm(CHD ~ AGE , data)

#2.You can use the following notation:

   #lsm(y~., data)

#3. Other example:

   #y <- c(1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1)
   #x1 <- c(2, 2, 2, 5, 5, 5, 5, 8, 8, 11, 11, 11)
   #data <- data.frame (y, x1)
   #ELAINYS <-lsm(y ~ x1, data)
   #summary(ELAINYS)

#4. Other example:

   #y <- as.factor(c(1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1))
   #x1 <- as.factor(c(2, 2, 2, 5, 5, 5, 5, 8, 8, 11, 11, 11))
   #data <- data.frame (y, x1)
   #ELAINYS1 <-lsm(y ~ x1, family=binomial, data)
   #summary(ELAINYS1)

Run the code above in your browser using DataLab