Learn R Programming

glm.ddR (version 0.1.0)

dglm: Distributed Generalized Linear Models

Description

dglm function is intended to be a distributed alternative for glm function.

Usage

dglm(responses, predictors, family=gaussian, weights=NULL, na_action="exclude", start=NULL, etastart=NULL, mustart=NULL, offset=NULL, control=list(...), method="dglm.fit.Newton", completeModel=FALSE, ...)

Arguments

responses
the darray that contains the vector of responses.
predictors
the darray that contains the vector of predictors. dglm() cannot accept a predictor with constant value. Moreover, a categorical predictor should be decoded (converted to several predictors) before applying dglm().
family
it specifies the family function for regression. The supported family-links at the time of this writing are gaussian(identity), binomial(logit), and poisson(log). The mentioned links are the default ones for their families; so, specifying them is optional. The default family is Gaussian.
weights
it is an optional darray of 'prior weights' to be used in the fitting process. It has a single column. The number of rows and its number of blocks should be the same as responses. The values should not be negative (greater than or equal to zero). Weight zero on a sample makes it be ignored.
na_action
it indicates what should happen when the data contain missed values. Values of NA, NaN, and Inf in samples are treated as missed values. There are two options for this argument exclude and fail. When exclude is selected (the default choice), the weight of any sample with missed values will become zero, and that sample will be ignored in the fitting process. In the darray which will be created for residuals, the value corresponding to these samples will be NA. When fail is selected, the function will stop in the case of any missed value in the dataset.
start
starting values for coefficients. It is optional.
etastart
starting values for parameter 'eta' which is used for computing deviance. It should be of type darray. It is optional.
mustart
starting values for mu 'parameter' which is used for computing deviance. It should be of type darray. It is optional.
offset
an optional darray which can be used to specify an _a priori_ known component to be included in the linear predictor during fitting.
control
an optional list of controlling arguments. The optional elements of the list and their default values are: epsilon = 1e-8, maxit = 25, trace = FALSE, rigorous = FALSE.
method
this argument reserved for the future improvement. The only available fitting method at the moment is "dglm.fit.Newton". In the future, if we have new developed algorithms, this argument can be used to switch between them.
completeModel
when it is FALSE (default), calculation of several output values that are not required for prediction are skipped. Therefore, the function can perform faster.
...
arguments to be used to form the default control argument if it is not supplied directly.

Value

coefficients
calculated coefficients
d.residuals
(available only when completeModel=TRUE; otherwise it is NULL) the working residuals, that is the residuals in the final iteration of the IWLS fit. Since cases with zero weights are omitted, their working residuals are NA. It is of type darray.
d.fitted.values
the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function. It is of type darray.
family
the family function used for regression
d.linear.predictors
the linear fit on link scale. It is of type darray.
deviance
up to a constant, minus twice the maximized log-likelihood.
aic
(available only when completeModel=TRUE; otherwise it is NA) a version of Akaike's An Information Criterion, minus twice the maximized log-likelihood plus twice the number of parameters, computed by the aic component of the family.
null.deviance
(available only when completeModel=TRUE; otherwise it is NA) the deviance for the null model, comparable with deviance.
iter
the number of iterations of IWLS used.
prior.weights
the weights initially supplied. All of its values are 1 if no initial weights used. It is of type darray. The value of weight will become 0 for the samples with invalid data (NA, NaN, Inf).
weights
the working weights, that is the weights in the final iteration of the IWLS fit. It is of type darray. In order to save memory and execution time, no new darray will be created for weights when the initial weights are all 0 or 1, and it will simply be a reference to prior.weights.
df.residual
the residual degrees of freedom.
df.null
the residual degrees of freedom for the null model.
converged
logical. Was the IWLS algorithm judged to have converged?
boundary
logical. Is the fitted value on the boundary of the attainable values?
responses
the darray of responses.
predictors
the darray of predictors.
na_action
this item exists only when a few samples are excluded because of missed data. It is a list containing type "exclude" and the number of excluded samples.
call
the matched call.
offset
the offset darray used.
control
the value of the control argument used.
method
the name of the fitter function used, currently always "dglm.fit.Newton".

Details

predictors and responses must align with each other (have the same number of rows and similar partitioning). Models created either in complete or incomplete mode can be used for prediction. The only motivation behind completeModel=FALSE is performance. Indeed, caluculation of several values, which are not required for prediction, are skipped.

Examples

Run this code
 ## Not run: 
#     ## Example for linear regression
#     library(glm.ddR)
# 
#     require(MASS)
#     # creating the darray of response
#     Y <- as.darray(data.matrix(Boston["medv"]))
#     # creating the darray of predictors
#     X <- as.darray(data.matrix(Boston[c("rad","crim","ptratio","dis")]))
#     # building linear regression model
#     reg <- dglm(Y,X, completeModel=TRUE)
#     summary(reg)
# 
#     ## Example for logistic regression
#     Y <- as.darray(data.matrix(mtcars["am"]))
#     X <- as.darray(data.matrix(mtcars[c("wt","hp")]))
# 
#     # building logistic regression model
#     myModel <- dglm(Y, X, binomial, completeModel=TRUE)
#     summary(myModel)
#     
#     ## Example for poisson regression
#     Y <- as.darray(data.matrix(mtcars["carb"]))
#     X <- as.darray(data.matrix(mtcars[-which(colnames(mtcars)=="carb")]))
#     # building linear regression model
#     reg <- dglm(Y,X, poisson, completeModel=TRUE)
#     summary(reg)
#  ## End(Not run)    

Run the code above in your browser using DataLab