lsm: Estimation of the log Likelihood of the Saturated Model

Description

When the values of the outcome variable Y are either 0 or 1, the function lsm() calculates the estimation of the log likelihood in the saturated model. This model is characterized by Llinas (2006, ISSN:2389-8976) in section 2.3 through the assumptions 1 and 2. If Y is dichotomous and the data are grouped in J populations, it is recommended to use the function lsm() because it works very well for all K.

Usage

lsm(formula, family = binomial, data = environment(formula))

Value

lsm returns an object of class "lsm".

An object of class "lsm" is a list containing at least the following components:

coefficients: Vector of coefficients estimations.
Std.Error: Vector of the coefficients’s standard error.
ExpB: Vector with the exponential of the coefficients.
Wald: Value of the Wald statistic.
DF: Degree of freedom for the Chi-squared distribution.
P.value: P-value with the Chi-squared distribution.
Log_Lik_Complete: Estimation of the log likelihood in the complete model.
Log_Lik_Null: Estimation of the log likelihood in the null model.
Log_Lik_Logit: Estimation of the log likelihood in the logistic model.
Log_Lik_Saturate: Estimation of the log likelihood in the saturate model.
Populations: Number of populations in the saturated model.
Dev_Null_vs_Logit: Value of the test statistic (Hypothesis: null vs logistic models).
Dev_Logit_vs_Complete: Value of the test statistic (Hypothesis: logistic vs complete models).
Dev_Logit_vs_Saturate: Value of the test statistic (Hypothesis: logistic vs saturated models).
Df_Null_vs_Logit: Degree of freedom for the test statistic’s distribution (Hypothesis: null vs logistic models).
Df_Logit_vs_Complete: Degree of freedom for the test statistic’s distribution (Hypothesis: logistic vs saturated models).
Df_Logit_vs_Saturate: Degree of freedom for the test statistic’s distribution (Hypothesis: Logistic vs saturated models)
P.v_Null_vs_Logit: p-values for the hypothesis test: null vs logistic models.
P.v_Logit_vs_Complete: p-values for the hypothesis test: logistic vs complete models.
P.v_Logit_vs_Saturate: p-values for the hypothesis test: logistic vs saturated models.
Logit: Vector with the log-odds.
p_hat: Vector with the probabilities that the outcome variable takes the value 1, given the jth population.
odd: Vector with the values of the odd in each jth population.
OR: Vector with the values of the odd ratio for each coefficient of the variables.
z_j: Vector with the values of each Zj (the sum of the observations in the jth population).
n_j: Vector with the nj (the number of the observations in each jth population).
p_j: Vector with the estimation of each pj (the probability of success in the jth population).
v_j: Vector with the variance of the Bernoulli variables in the jth population.
m_j: Vector with the expected values of Zj in the jth population.
V_j: Vector with the variances of Zj in the jth population.
V: Variance and covariance matrix of Z, the vector that contains all the Zj.
S_p: Score vector in the saturated model.
I_p: Information matrix in the saturated model.
Zast_j: Vector with the values of the standardized variable of Zj.
mcov: Variance and covariance matrix for coefficient estimates.
mcor: Correlation matrix for coefficient estimates.
Esm: Estimates in the saturated model.
Elm: Estimates in the logistic model.

Arguments

formula: An expression of the form y ~ model, where y is the outcome variable (binary or dichotomous: its values are 0 or 1).
family: an optional funtion for example binomial.
data: an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which lsm() is called.

Author

Humberto Llinas Solano [aut], Universidad del Norte, Barranquilla-Colombia \ Omar Fabregas Cera [aut], Universidad del Norte, Barranquilla-Colombia \ Jorge Villalba Acevedo [cre, aut], Universidad Tecnológica de Bolívar, Cartagena-Colombia.

Details

Estimation of the log Likelihood of the Saturated Model

The saturated model is characterized by the assumptions 1 and 2 presented in section 2.3 by Llinas (2006, ISSN:2389-8976).

References

[1] Humberto Jesus Llinas. (2006). Accuracies in the theory of the logistic models. Revista Colombiana De Estadistica,29(2), 242-244.

[2] Hosmer, D. (2013). Wiley Series in Probability and Statistics Ser. : Applied Logistic Regression (3). New York: John Wiley & Sons, Incorporated.

[3] Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.

Examples

Run this code

# Hosmer, D. (2013) page 3: Age and coranary Heart Disease (CHD) Status of 20 subjects:

#library(lsm)

#AGE <- c(20,23,24,25,25,26,26,28,28,29,30,30,30,30,30,30,30,32,33,33)
#CHD <- c(0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0)

#data <- data.frame (CHD,  AGE )
#lsm(CHD ~ AGE , family=binomial, data)

## For more ease, use the following notation.

#lsm(y~., data)

# Other case.

#y <- c(1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1)
#x1 <- c(2, 2, 2, 5, 5, 5, 5, 8, 8, 11, 11, 11)

#data <- data.frame (y, x1)
#ELAINYS <-lsm(y ~ x1, family=binomial, data)
#summary(ELAINYS)

# Other case.

#y <- as.factor(c(1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1))
#x1 <- as.factor(c(2, 2, 2, 5, 5, 5, 5, 8, 8, 11, 11, 11))

#data <- data.frame (y, x1)
#ELAINYS1 <-lsm(y ~ x1, family=binomial, data)
#confint(ELAINYS1)

Run the code above in your browser using DataLab