parglm (version 0.1.6)

parglm: Fitting Generalized Linear Models in Parallel

Description

Function like glm which can make the computation in parallel. The function supports most families listed in family. See "vignette("parglm", "parglm")" for run time examples.

Usage

parglm(formula, family = gaussian, data, weights, subset, na.action,
  start = NULL, offset, control = list(...), contrasts = NULL,
  model = TRUE, x = FALSE, y = TRUE, ...)

parglm.fit(x, y, weights = rep(1, NROW(x)), start = NULL, etastart = NULL, mustart = NULL, offset = rep(0, NROW(x)), family = gaussian(), control = list(), intercept = TRUE, ...)

Arguments

formula

an object of class formula.

family

a family object.

data

an optional data frame, list or environment containing the variables in the model.

weights

an optional vector of 'prior weights' to be used in the fitting process. Should be NULL or a numeric vector.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain NAs.

start

starting values for the parameters in the linear predictor.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting.

control

a list of parameters for controlling the fitting process. For parglm.fit this is passed to parglm.control.

contrasts

an optional list. See the contrasts.arg of model.matrix.default.

model

a logical value indicating whether model frame should be included as a component of the returned value.

x, y

For parglm: logical values indicating whether the response vector and model matrix used in the fitting process should be returned as components of the returned value.

For parglm.fit: x is a design matrix of dimension n * p, and y is a vector of observations of length n.

...

For parglm: arguments to be used to form the default control argument if it is not supplied directly.

For parglm.fit: unused.

etastart

starting values for the linear predictor. Not supported.

mustart

starting values for the vector of means. Not supported.

intercept

logical. Should an intercept be included in the null model?

Value

glm object as returned by glm but differs mainly by the qr element. The qr element in the object returned by parglm(.fit) only has the \(R\) matrix from the QR decomposition.

Details

The current implementation uses min(as.integer(n / p), nthreads) threads where n is the number observations, p is the number of covariates, and nthreads is the nthreads element of the list returned by parglm.control. Thus, there is likely little (if any) reduction in computation time if p is almost equal to n. The current implementation cannot handle p > n.

Examples

Run this code
# NOT RUN {
# small example from `help('glm')`. Fitting this model in parallel does
# not matter as the data set is small
clotting <- data.frame(
  u = c(5,10,15,20,30,40,60,80,100),
  lot1 = c(118,58,42,35,27,25,21,19,18),
  lot2 = c(69,35,26,21,18,16,13,12,12))
f1 <- glm   (lot1 ~ log(u), data = clotting, family = Gamma)
f2 <- parglm(lot1 ~ log(u), data = clotting, family = Gamma,
             control = parglm.control(nthreads = 2L))
all.equal(coef(f1), coef(f2))

# }

Run the code above in your browser using DataLab