dglm: Distributed Generalized Linear Models

Description

dglm function is intended to be a distributed alternative for glm function.

Usage

dglm(responses, predictors, family=gaussian, weights=NULL, 
na_action="exclude", start=NULL, etastart=NULL, mustart=NULL,
offset=NULL, control=list(...), method="dglm.fit.Newton",
completeModel=FALSE, ...)

Arguments

responses

the darray that contains the vector of responses.

predictors

the darray that contains the vector of predictors. dglm() cannot accept a predictor with constant value. Moreover, a categorical predictor should be decoded (converted to several predictors) before applying dglm().

family

it specifies the family function for regression. The supported family-links at the time of this writing are gaussian(identity), binomial(logit), and poisson(log). The mentioned links are the default ones for their families; so, specifying them is optional. The default family is Gaussian.

weights

it is an optional darray of 'prior weights' to be used in the fitting process. It has a single column. The number of rows and its number of blocks should be the same as responses. The values should not be negative (greater than or equal to zero). Weight zero on a sample makes it be ignored.

na_action

it indicates what should happen when the data contain missed values. Values of NA, NaN, and Inf in samples are treated as missed values. There are two options for this argument exclude and fail. When exclude is selected (the default choice), the weight of any sample with missed values will become zero, and that sample will be ignored in the fitting process. In the darray which will be created for residuals, the value corresponding to these samples will be NA. When fail is selected, the function will stop in the case of any missed value in the dataset.

start

starting values for coefficients. It is optional.

etastart

starting values for parameter 'eta' which is used for computing deviance. It should be of type darray. It is optional.

mustart

starting values for mu 'parameter' which is used for computing deviance. It should be of type darray. It is optional.

offset

an optional darray which can be used to specify an _a priori_ known component to be included in the linear predictor during fitting.

control

an optional list of controlling arguments. The optional elements of the list and their default values are: epsilon = 1e-8, maxit = 25, trace = FALSE, rigorous = FALSE.

method

this argument reserved for the future improvement. The only available fitting method at the moment is "dglm.fit.Newton". In the future, if we have new developed algorithms, this argument can be used to switch between them.

completeModel

when it is FALSE (default), calculation of several output values that are not required for prediction are skipped. Therefore, the function can perform faster.

...

arguments to be used to form the default control argument if it is not supplied directly.

Value

coefficients: calculated coefficients
d.residuals: (available only when completeModel=TRUE; otherwise it is NULL) the working residuals, that is the residuals in the final iteration of the IWLS fit. Since cases with zero weights are omitted, their working residuals are NA. It is of type darray.
d.fitted.values: the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function. It is of type darray.
family: the family function used for regression
d.linear.predictors: the linear fit on link scale. It is of type darray.
deviance: up to a constant, minus twice the maximized log-likelihood.
aic: (available only when completeModel=TRUE; otherwise it is NA) a version of Akaike's An Information Criterion, minus twice the maximized log-likelihood plus twice the number of parameters, computed by the aic component of the family.
null.deviance: (available only when completeModel=TRUE; otherwise it is NA) the deviance for the null model, comparable with deviance.
iter: the number of iterations of IWLS used.
prior.weights: the weights initially supplied. All of its values are 1 if no initial weights used. It is of type darray. The value of weight will become 0 for the samples with invalid data (NA, NaN, Inf).
weights: the working weights, that is the weights in the final iteration of the IWLS fit. It is of type darray. In order to save memory and execution time, no new darray will be created for weights when the initial weights are all 0 or 1, and it will simply be a reference to prior.weights.
df.residual: the residual degrees of freedom.
df.null: the residual degrees of freedom for the null model.
converged: logical. Was the IWLS algorithm judged to have converged?
boundary: logical. Is the fitted value on the boundary of the attainable values?
responses: the darray of responses.
predictors: the darray of predictors.
na_action: this item exists only when a few samples are excluded because of missed data. It is a list containing type "exclude" and the number of excluded samples.
call: the matched call.
offset: the offset darray used.
control: the value of the control argument used.
method: the name of the fitter function used, currently always "dglm.fit.Newton".

Details

predictors and responses must align with each other (have the same number of rows and similar partitioning). Models created either in complete or incomplete mode can be used for prediction. The only motivation behind completeModel=FALSE is performance. Indeed, caluculation of several values, which are not required for prediction, are skipped.

Examples

Run this code

 ## Not run: 
#     ## Example for linear regression
#     library(glm.ddR)
# 
#     require(MASS)
#     # creating the darray of response
#     Y <- as.darray(data.matrix(Boston["medv"]))
#     # creating the darray of predictors
#     X <- as.darray(data.matrix(Boston[c("rad","crim","ptratio","dis")]))
#     # building linear regression model
#     reg <- dglm(Y,X, completeModel=TRUE)
#     summary(reg)
# 
#     ## Example for logistic regression
#     Y <- as.darray(data.matrix(mtcars["am"]))
#     X <- as.darray(data.matrix(mtcars[c("wt","hp")]))
# 
#     # building logistic regression model
#     myModel <- dglm(Y, X, binomial, completeModel=TRUE)
#     summary(myModel)
#     
#     ## Example for poisson regression
#     Y <- as.darray(data.matrix(mtcars["carb"]))
#     X <- as.darray(data.matrix(mtcars[-which(colnames(mtcars)=="carb")]))
#     # building linear regression model
#     reg <- dglm(Y,X, poisson, completeModel=TRUE)
#     summary(reg)
#  ## End(Not run)

Run the code above in your browser using DataLab