walsNBfit: Fitter function for Weighted Average Least Squares estimation of NB2 regression model

Description

Workhorse function behind walsNB and used internally in walsNBfitIterate.

Usage

walsNBfit(
  X1,
  X2,
  y,
  betaStart1,
  betaStart2,
  rhoStart,
  family,
  prior,
  method = c("fullSVD", "original"),
  svdTol = .Machine$double.eps,
  svdRtol = 1e-06,
  keepUn = FALSE,
  keepR = FALSE,
  eigenSVD = TRUE,
  postmult = TRUE,
  ...
)

Value

A list containing

coef: Model averaged estimates of all coefficients.
beta1: Model averaged estimates of the coefficients of the focus regressors.
beta2: Model averaged estimates of the coefficients of the auxiliary regressors.
rho: Model averaged estimate of the log-dispersion parameter of the NB2 distribution.
gamma1: Model averaged estimates of the coefficients of the transformed focus regressors.
gamma2: Model averaged estimates of the coefficients of the transformed auxiliary regressors.
condition: Condition number of the matrix $\bar{\Xi} = \bar{\Delta}_{2} \bar{X}_{2}^{\top} \bar{M}_{1} \bar{X}_{2} \bar{\Delta}_{2}$.
vcovBeta: NULL, not implemented yet, placeholder for estimated covariance matrix of the regression coefficients.
vcovGamma: NULL, not implemented yet, placeholder for estimated covariance matrix of the coefficients of the transformed regressors.
betaStart: Starting values of the regression coefficients for the one-step ML estimators.
rhoStart: Starting values of the dispersion parameter for the one-step ML estimators.
method: Stores method used from the arguments.
prior: familyPrior. The prior specified in the arguments.
betaUn1: If keepUn = TRUE, contains the unrestricted one-step ML estimators of the coefficients of the focus regressors. Else NULL.
betaUn2: If keepUn = TRUE, contains the unrestricted one-step ML estimators of the coefficients of the auxiliary regressors. Else NULL.
gammaUn1: If keepUn = TRUE, contains the unrestricted one-step ML estimators of the coefficients of the transformed focus regressors. Else NULL.
gammaUn2: If keepUn = TRUE, contains the unrestricted one-step ML estimators of the coefficients of the transformed auxiliary regressors. Else NULL.
gamma1r: If keepR = TRUE, contains the fully restricted one-step ML estimator for the transformed regressors (only focus regressors). Else NULL.
k1: Number of focus regressors.
k2: Number of auxiliary regressors.
n: Number of observations.
X1names: Names of the focus regressors.
X2names: Names of the auxiliary regressors.
familyStart: The family object of class "familyNBWALS" used for the estimation of the starting values.
family: The family object of class "familyNBWALS" used later for predictions.
fitted.link: Linear link fitted to the data.
fitted.values: Estimated conditional mean for the data. Lives on the scale of the response.

Arguments

X1: Design matrix for focus regressors. Usually includes a constant (column full of 1s) and can be generated using model.matrix.
X2: Design matrix for auxiliary regressors. Usually does not include a constant column and can also be generated using model.matrix.
y: Count response as vector.
betaStart1: Starting values for coefficients of focus regressors X1.
betaStart2: Starting values for coefficients of auxiliary regressors X2.
rhoStart: Starting value for log-dispersion parameter of NB2
family: Object of class "familyNBWALS". Currently only supports negbinWALS.
prior: Object of class "familyPrior". For example weibull or laplace.
method: Specifies method used. Available methods are "fullSVD" (default) or "original". See details.
svdTol: Tolerance for rank of matrix $\bar{Z}_{1}$ and $\bar{Z}$. Only used if method = "fullSVD". Checks if smallest eigenvalue in SVD of $\bar{Z}_1$ and $\bar{Z}$ is larger than svdTol, otherwise reports a rank deficiency.
svdRtol: Relative tolerance for rank of matrix $\bar{Z}_{1}$ and $\bar{Z}$. Only used if method = "fullSVD". Checks if ratio of largest to smallest eigenvalue in SVD of $\bar{Z}_1$ and $\bar{Z}$ is larger than svdRtol, otherwise reports a rank deficiency.
keepUn: If TRUE, keeps the one-step ML estimators of the unrestricted model, i.e. $\tilde{\gamma}_{u}$ and $\tilde{\beta}_{u}$.
keepR: If TRUE, keeps the one-step ML estimators of the fully restricted model, i.e. $\tilde{\gamma}_{r}$ and $\tilde{\beta}_{r}$.
eigenSVD: If TRUE, then semiorthogonalize() uses svd() to compute the eigendecomposition of $\bar{\Xi}$ instead of eigen(). In this case, the tolerances of svdTol and svdRtol are used to determine whether $\bar{\Xi}$ is of full rank (need it for $\bar{\Xi}^{-1/2}$).
postmult: If TRUE (default), then it computes $$\bar{Z}_{2} = \bar{X}_{2} \bar{\Delta}_{2} \bar{T} \bar{\Lambda}^{-1/2} \bar{T}^{\top},$$ where $\bar{T}$ contains the eigenvectors and $\bar{\Lambda}$ the eigenvalues from the eigenvalue decomposition $$\bar{\Xi} = \bar{T} \bar{\Lambda} \bar{T}^{\top},$$ instead of $$\bar{Z}_{2} = \bar{X}_{2} \bar{\Delta}_{2} \bar{T} \bar{\Lambda}^{-1/2}.$$ See huynhwals;textualWALS for more details. The latter is used in the original MATLAB code for WALS in the linear regression model, see eq. (12) of magnus2016wals;textualWALS. The first form is required in eq. (9) of deluca2018glm;textualWALS. Thus, it is not recommended to set postmult = FALSE.
...: Arguments for internal function computePosterior.

Details

The method to be specified in method mainly differ in the way they compute the fully restricted and unrestricted estimators for the transformed regressors $Z$, i.e. $\tilde{\gamma}_{1r}$, and $\tilde{\gamma}_{u}$.

"fullSVD"

Recommended approach. First applies an SVD to $\bar{Z}_{1}$ to compute $\bar{X}_{2}^{\top} \bar{M}_{1} \bar{X}_{2}$: It is used for computing the inverse of

$$\bar{X}_{1}^{\top}\bar{X}_{1} + \bar{g} \bar{\epsilon} X_{1}^{\top}\bar{q} \bar{q}^{\top} X_{1},$$

when using the Sherman-Morrison-Woodbury formula. We further leverage the SVD of $\bar{Z}_1$ and additionally $\bar{Z}$ to compute the unrestricted estimator $\tilde{\gamma}_{u}$ and the fully restricted estimator $\tilde{\gamma}_{r}$. For $\tilde{\gamma}_{u}$, we simply use the SVD of $\bar{Z}$ to solve the full equation system derived from the one-step ML problem for more details. The SVD of $\bar{Z}_1$ is further used in computing the model averaged estimator for the focus regressors $\hat{\gamma}_1$.

Described in more detail in the appendix of huynhwals;textualWALS.

"original"

Computes all inverses directly using solve and does not make use of the Sherman-Morrison-Woodbury formula for certain inverses. Specifically, it directly inverts the matrix $\bar{Z}_{1}^{\top} \bar{Z}_{1}$ using solve in order to compute $\bar{M}_1$. Moreover, it computes the fully unrestricted estimators of the focus regressors $\tilde{\gamma}_{1u}$ and of the auxiliary regressors $\tilde{\gamma}_{2u}$ and the fully restricted estimator $\tilde{\gamma}_{1r}$ by directly implementing the formulas derived in huynhwalsnb;textualWALS. This method should only be used as reference and for easier debugging.

All variables in the code that contain "start" in their name are computed using the starting values of the one-step ML estimators. See section "One-step ML estimator" of huynhwalsnbWALS for details.

References

Examples

Run this code

data("NMES1988", package = "AER")
NMES1988 <- na.omit(NMES1988)
form <- (visits ~ health + chronic + age + insurance + adl + region + gender
         + married + income + school + employed)
X <- model.matrix(form, data = NMES1988)
focus <- c("(Intercept)", "healthpoor", "healthexcellent", "chronic", "age",
        "insuranceyes")
aux <- c("adllimited", "regionnortheast", "regionmidwest", "regionwest",
         "gendermale", "marriedyes", "income", "school", "employedyes")
X1 <- X[, focus]
X2 <- X[, aux]
y <- NMES1988$visits

# starting values from glm.nb() from MASS
startFit <- MASS::glm.nb(y ~ X[,-1])
betaStart <- coef(startFit)
rhoStart <- startFit$theta
k1 <- ncol(X1)
k2 <- ncol(X2)

str(walsNBfit(X1, X2, y, rhoStart, family = negbinWALS(scale = rhoStart, link = "log"),
              betaStart1 = betaStart[1:k1],
              betaStart2 = betaStart[(k1 + 1):(k1 + k2)],
              prior = weibull(), method = "fullSVD"))

Run the code above in your browser using DataLab