summary.mdyplFit: Summary method for `"mdyplFit"` objects

Description

Summary method for "mdyplFit" objects

Usage

# S3 method for mdyplFit
summary(object, hd_correction = FALSE, solve_se_control = list(), ...)
# S3 method for summary.mdyplFit
print(
  x,
  digits = max(3L, getOption("digits") - 3L),
  symbolic.cor = x$symbolic.cor,
  signif.stars = getOption("show.signif.stars"),
  ...
)

Value

A list with objects as in the result of stats::summary.glm(), with extra component se_parameters, which is the vector of the solution to the state evolution equations with extra attributes (see solve_se()).

Arguments

object: an object of class "glm", usually, a result of a call to glm.
hd_correction: if FALSE (default), then the corresponding quantities are computed according to standard asymptotics. If TRUE then the high-dimensionality corrections in Sterzinger & Kosmidis (2024) are employed to updates estimates, estimated standard errors, z-statistics, etc. See Details.
solve_se_control: a list of further arguments to be passed to solve_se(). Even if explicitly specified, the arguments kappa, ss, alpha, and intercept are always set according to object, and corrupted is set to TRUE.
...: further arguments to be passed to summary.glm().
x: an object of class "summary.glm", usually, a result of a call to summary.glm.
digits: the number of significant digits to use when printing.
symbolic.cor: logical. If TRUE, print the correlations in a symbolic form (see symnum) rather than as numbers.
signif.stars: logical. If TRUE, ‘significance stars’ are printed for each coefficient.

Author

Ioannis Kosmidis [aut, cre] ioannis.kosmidis@warwick.ac.uk

Details

If hd_correction = TRUE, the sloe() estimator of the square root of the corrupted signal strength is estimated from object, as are the conditional variances of each covariate given the others (excluding the intercept). The latter are estimated using residual sums of squares from the linear regression of each covariate on all the others, as proposed in Zhao et al (2021, Section 5.1). Then the appropriate state evolution equations are solved using solve_se() with corrupted = TRUE, and the obtained constants are used to rescale the estimates, and adjust estimated standard errors and z-statistics as in Sterzinger & Kosmidis (2024).

The key assumptions under which the rescaled estimates and corrected standard errors and z-statistics are asymptotically valid are that the covariates have sub-Gaussian distributions, and that the signal strength, which is the limit \(\gamma^2\) of \(var(X \beta)\) is finite as \(p / n \to \kappa \in (0, 1)\), with \(\kappa \in (0, 1)\). See Sterzinger & Kosmidis (2024).

If hd_correction = TRUE, and the model has an intercept, then the result provides only a corrected estimate of the intercept with no accompanying standard error, z-statistic, and p-value. Also, vcov(summary(object, hd_correction = TRUE)) is always NULL. Populating those objects with appropriate estimates is the subject of current work.

References

Zhao Q, Sur P, Cand\`es E J (2022). The asymptotic distribution of the MLE in high-dimensional logistic models: Arbitrary covariance. Bernoulli, 28, 1835–1861. tools:::Rd_expr_doi("10.3150/21-BEJ1401").

Sterzinger P, Kosmidis I (2024). Diaconis-Ylvisaker prior penalized likelihood for \(p/n \to \kappa \in (0,1)\) logistic regression. arXiv:2311.07419v2, https://arxiv.org/abs/2311.07419.

Examples

Run this code


# \donttest{

set.seed(123)
n <- 2000
p <- 400
set.seed(123)
betas <- c(rnorm(p / 2, mean = 7, sd = 1), rep(0, p / 2))
X <- matrix(rnorm(n * p, 0, 1/sqrt(n)), nrow = n, ncol = p)
probs <- plogis(drop(X %*% betas))
y <- rbinom(n, 1, probs)
fit_mdypl <- glm(y ~ -1 + X, family = binomial(), method = "mdyplFit")

st_summary <- summary(fit_mdypl)
hd_summary <- summary(fit_mdypl, hd_correction = TRUE)

cols <- hcl.colors(3, alpha = 0.2)
par(mfrow = c(1, 2))
plot(betas, type = "l", ylim = c(-3, 14),
     main = "MDYPL estimates",
     xlab = "Parameter index", ylab = NA)
points(coef(st_summary)[, "Estimate"], col = NA, bg = cols[1], pch = 21)

plot(betas, type = "l", ylim = c(-3, 14),
     main = "rescaled MDYPL estimates",
     xlab = "Parameter index", ylab = NA)
points(coef(hd_summary)[, "Estimate"], col = NA, bg = cols[2], pch = 21)

## z-statistics
z_mdypl <- coef(st_summary)[betas == 0, "z value"]
qqnorm(z_mdypl, col = NA, bg = cols[1], pch = 21, main = "z value")
abline(0, 1, lty = 2)
z_c_mdypl <- coef(hd_summary)[betas == 0, "z value"]
qqnorm(z_c_mdypl, col = NA, bg = cols[2], pch = 21, main = "corrected z value")
abline(0, 1, lty = 2)

# }

Run the code above in your browser using DataLab