MGLMreg: Fit multivariate response GLM regression

Description

MGLMreg fits multivariate response generalized linear models, specified by a symbolic description of the linear predictor and a description of the error distribution.

Usage

MGLMreg(
  formula,
  data,
  dist,
  init = NULL,
  weight = NULL,
  epsilon = 1e-08,
  maxiters = 150,
  display = FALSE,
  LRT = FALSE,
  parallel = FALSE,
  cores = NULL,
  cl = NULL,
  sys = NULL,
  regBeta = FALSE
)
MGLMreg.fit(
  Y,
  init = NULL,
  X,
  dist,
  weight = NULL,
  epsilon = 1e-08,
  maxiters = 150,
  display = FALSE,
  LRT = FALSE,
  parallel = FALSE,
  cores = NULL,
  cl = NULL,
  sys = NULL,
  regBeta = FALSE
)

Arguments

formula

an object of class formula (or one that can be coerced to that class): a symbolic description of the model to be fitted. The response has to be on the left hand side of ~.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data when using function MGLMreg, the variables are taken from environment(formula), typically the environment from which MGLMreg is called.

dist

a description of the error distribution to fit. See dist for details.

init

an optional matrix of initial value of the parameter estimates. Should have the compatible dimension with data. See dist for details of the dimensions in each distribution.

weight

an optional vector of weights assigned to each row of the data. Should be NULL or a numeric vector. Could be a variable from data, or a variable from environment(formula) with the length equal to the number of rows of the data. If weight=NULL, equal weights of ones will be assigned. Default is NULL.

epsilon

an optional numeric controlling the stopping criterion. The algorithm terminates when the relative change in the loglikelihoods of two successive iterates is less than epsilon. The default value is epsilon=1e-8.

maxiters

an optional numeric controlling the maximum number of iterations. The default value is maxiters=150.

display

an optional logical variable controlling the display of iterations. The default value is display=FALSE.

LRT

an optional logical variable controlling whether to perform likelihood ratio test on each predictor. The default value is LRT=FALSE, in which case only the Wald test is performed.

parallel

an optional logical variable controlling whether to perform parallel computing. On a multi-core Windows machine, a cluster is created based on socket; on a multi-core Linux/Mac machine, a cluster is created based on forking. The default value is parallel=FALSE.

cores

an optional value specifying the number of cores to use. Default value is half of the logical cores.

a cluster object, created by the package parallel or by package snow. If parallel=TRUE, use the registered default cluster; if parallel=FALSE, any given value to cl will be ignored.

sys

the operating system. Will be used when choosing parallel type.

regBeta

an optional logical variable. When dist="NegMN", the user can decide whether to run regression on the overdispersion parameter \(\beta\). The default is regBeta=FALSE.

Y, X

for MGLMreg.fit, X is a design matrix of dimension n*(p+1) and Y is the response matrix of dimension n*d.

Value

Returns an object of class "MGLMreg". An object of class "MGLMreg" is a list containing the following components:

coefficients the estimated regression coefficients.
SE the standard errors of the estimates.
Hessian the Hessian at the estimated parameter values.
gradient the gradient at the estimated parameter values.
wald.value the Wald statistics.
wald.p the p values of Wald test.
test test statistic and the corresponding p-value. If LRT=FALSE, only returns test resultsfrom Wald test; if LRT=TRUE, returns the test results from both Wald test and likelihood ratio test.
logL the final loglikelihood.
BIC Bayesian information criterion.
AIC Akaike information criterion.
fitted the fitted values from the regression model
iter the number of iterations used.
call the matched call.
distribution the distribution fitted.
data the data used to fit the model.
Dof degrees of freedom.

Details

The formula should be in the form responses ~ covariates where the responses are the multivariate count matrix or a few columns from a data frame which is specified by data. The covariates are either matrices or from the data frame. The covariates can be numeric or character or factor. See dist for details about distributions.

Instead of using the formula, the user can directly input the design matrix and the response vector using MGLMreg.fit function.

Examples

Run this code

# NOT RUN {
##----------------------------------------##
## Generate data
n <- 2000
p <- 5
d <- 4
m <- rep(20, n)
set.seed(1234)
X <- 0.1* matrix(rnorm(n*p),n, p)
alpha <- matrix(1, p, d-1)
beta <- matrix(1, p, d-1)
Alpha <- exp(X %*% alpha)
Beta <- exp(X %*% beta)
gdm.Y <- rgdirmn(n, m, Alpha, Beta)

##----------------------------------------##
## Regression
gdm.reg <- MGLMreg(gdm.Y~X, dist="GDM", LRT=FALSE)


# }