Learn R Programming

scf (version 1.0.5)

scf_glm: Estimate Generalized Linear Model from SCF Microdata

Description

Estimates generalized linear models (GLMs) with SCF public-use microdata. Use this function when modeling outcomes that follow non-Gaussian distributions (e.g., binary or count data). Rubin's Rules are used to combine implicate-level coefficient and variance estimates.

GLMs are performed across SCF implicates using svyglm() and returns pooled coefficients, standard errors, z-values, p-values, and fit diagnostics including AIC and pseudo-R-Squared when applicable.

Usage

scf_glm(object, formula, family = binomial())

Value

An object of class "scf_glm" and "scf_model_result" with:

results

A data frame of pooled coefficients, standard errors, z-values, p-values, and significance stars.

fit

A list of fit diagnostics including mean and SD of AIC; for binomial models, pseudo-R2 and its SD.

models

A list of implicate-level svyglm model objects.

call

The matched function call.

Arguments

object

A scf_mi_survey object, typically created using scf_load() and scf_design().

formula

A valid model formula, e.g., rich ~ age + factor(edcl).

family

A GLM family object such as binomial(), poisson(), or gaussian(). Defaults to binomial().

Implementation

This function fits a GLM to each implicate in a scf_mi_survey object using survey::svyglm(). The user specifies a model formula and a valid GLM family (e.g., binomial(), poisson(), gaussian()). Coefficients and variance-covariance matrices are extracted from each implicate and pooled using Rubin's Rules.

Internal Suppression

For CRAN compliance and to prevent diagnostic overload during package checks, this function internally wraps each implicate-level model call in suppressWarnings(). This suppresses the known benign warning:

"non-integer #successes in a binomial glm!"

which arises from using replicate weights with family = binomial(). This suppression does not affect model validity or inference. Users wishing to inspect warnings can run survey::svyglm() directly on individual implicates via model$models[[i]].

For further background, see: https://stackoverflow.com/questions/12953045/warning-non-integer-successes-in-a-binomial-glm-survey-packages

Details

Generalized linear models (GLMs) extend linear regression to accommodate non-Gaussian outcome distributions. The choice of family determines the link function and error distribution. For example:

Model estimation is performed independently on each implicate using svyglm() with replicate weights. Rubin's Rules are used to pool coefficient estimates and variance matrices. For the pooling procedure, see scf_MIcombine().

See Also

scf_ols(), scf_logit(), scf_regtable()

Examples

Run this code
# \donttest{
# Do not implement these lines in real analysis:
# Use functions `scf_download()` and `scf_load()`
td <- tempfile("glm_")
dir.create(td)

src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")
file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)
scf2022 <- scf_load(2022, data_directory = td)

# Example for real analysis: Run logistic regression
model <- scf_glm(scf2022, own ~ age + factor(edcl), family = binomial())
summary(model)

# Do not implement these lines in real analysis: Cleanup for package check
unlink(td, recursive = TRUE, force = TRUE)
# }

Run the code above in your browser using DataLab