hglm: Fitting Holistic Generalized Linear Models

Description

Fit a generalized linear model under holistic constraints.

Usage

hglm(
  formula,
  family = gaussian(),
  data,
  constraints = NULL,
  weights = NULL,
  scaler = c("auto", "center_standardization", "center_minmax", "standardization",
    "minmax", "off"),
  scale_response = NULL,
  big_m = 100,
  solver = "auto",
  control = list(),
  dry_run = FALSE,
  approx = FALSE,
  object_size = c("normal", "big"),
  ...
)
holiglm(
  formula,
  family = gaussian(),
  data,
  constraints = NULL,
  weights = NULL,
  scaler = c("auto", "center_standardization", "center_minmax", "standardization",
    "minmax", "off"),
  scale_response = NULL,
  big_m = 100,
  solver = "auto",
  control = list(),
  dry_run = FALSE,
  approx = FALSE,
  object_size = c("normal", "big"),
  ...
)
hglm_seq(
  k_seq,
  formula,
  family = gaussian(),
  data,
  constraints = NULL,
  weights = NULL,
  scaler = c("auto", "center_standardization", "center_minmax", "standardization",
    "minmax", "off"),
  big_m = 100,
  solver = "auto",
  control = list(),
  object_size = c("normal", "big"),
  parallel = FALSE
)

Value

An object of class "hglm" inheriting from "glm".

Arguments

formula: an object of class "formula" giving the symbolic description of the model to be fitted.
family: a description of the error distribution and link function to be used in the model.
data: a data.frame or matrix giving the data for the estimation.
constraints: a list of 'HGLM' constraints stored in a list of class "lohglmc". Use NULL to turn off constraints.
weights: an optional vector of 'prior weights' to be used for the estimation.
scaler: a character string giving the name of the scaling function (default is "auto") to be employed for the covariates. This typically does not need to be changed.
scale_response: a boolean whether the response shall be standardized or not. Can only be used with family gaussian(). Default is TRUE for family gaussian() and FALSE for other families.
big_m: an upper bound for the coefficients, needed for the big-M constraint. Required to inherit from "hglmc". Currently constraints created by group_sparsity(), group_inout(), include() and group_equal() use the big-M value specified here.
solver: a character string giving the name of the solver to be used for the estimation.
control: a list of control parameters passed to ROI_solve.
dry_run: a logical; if TRUE the model is not fit but only constructed.
approx: a logical; if TRUE uses linear approximation of log-likelihood.
object_size: a character string giving the object size, allowed values are "normal" and "big". If "big" is choosen, also the ROI solution and the "hglm_model" object are returned.
...: For ‘approx’: further arguments passed to or from other methods.
k_seq: an integer vector giving the values of k_max for which the model should be estimated.
parallel: whether estimation of sequence shall be parallelized

Details

In the case of binding linear constraints the standard errors are corrected, more information about the correction can be found in Schwendinger, Schwendinger and Vana (2024) tools:::Rd_expr_doi("10.18637/jss.v108.i07").

References

Schwendinger B., Schwendinger F., Vana L. (2024). Holistic Generalized Linear Models tools:::Rd_expr_doi("10.18637/jss.v108.i07")

Bertsimas, D., & King, A. (2016). OR Forum-An Algorithmic Approach to Linear Regression Operations Research 64(1):2-16. tools:::Rd_expr_doi("10.1287/opre.2015.1436")

McCullagh, P., & Nelder, J. A. (2019). Generalized Linear Models (2nd ed.) Routledge. tools:::Rd_expr_doi("10.1201/9780203753736").

Dobson, A. J., & Barnett, A. G. (2018). An Introduction to Generalized Linear Models (4th ed.) Chapman and Hall/CRC. tools:::Rd_expr_doi("10.1201/9781315182780")

Chares, Robert. (2009). “Cones and Interior-Point Algorithms for Structured Convex Optimization involving Powers and Exponentials.”

Chen, J., & Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95 (3): 759–771. Oxford University Press. tools:::Rd_expr_doi("10.1093/biomet/asn034")

Zhu, J., Wen, C., Zhu, J., Zhang, H., & Wang, X. (2020). A polynomial algorithm for best-subset selection problem. Proceedings of the National Academy of Sciences, 117 (52): 33117–33123. tools:::Rd_expr_doi("10.1073/pnas.2014241117")

Examples

Run this code

dat <- rhglm(100, c(1, 2, -3, 4, 5, -6))
hglm(y ~ ., constraints = NULL, data = dat)
# estimation without constraints
hglm(y ~ ., constraints = NULL, data = dat)
# estimation with an upper bound on the number of coefficients to be selected
hglm(y ~ ., constraints = k_max(3), data = dat)
# estimation without intercept
hglm(y ~ . - 1, data = dat)

Run the code above in your browser using DataLab