glm.fit: Fitting generalized linear models

Description

glm.fit is used to fit generalized linear models specified by a model matrix and response vector. glm_scidb is a simplified interface for scidbdf objects similar (but much simpler than) glm.

Usage

## S3 method for class 'scidb':
glm.fit(x,y,weights=NULL,family=gaussian())
glm_scidb(formula, data, family=gaussian(), weights=NULL)
model_scidb(formula, data, factors=NULL)

Arguments

a model matrix of dimension 'n * p'.

a response vector of length 'n'.

formula

an object of class formula (or one that can be coerced to that class): a symbolic description of the model to be fitted. See details for limitations.

data

an object of class scidbdf.

weights

an optional vector of 'prior weights' to be used in the fitting process. Should be 'NULL' or a numeric or scidb vector.

family

a description of the error distribution and link function to be used in the model, supplied as the result of a call to a family function.

factors

a list of factor encodings to use in the model matrix. See details.

Value

The glm.fit and glm_scidb functions return a list of model output values described below. The glm_scidb method uses an S3 class to additionally overload nice printing and summary methods.
1. coefficients
{ model coefficient vector (SciDB array)}
stderrvector of model coefficient standard errors (SciDB array)
tvalvector of model coefficient t ratio values using estimated dispersion value (SciDB array)
pvalvector of two-tailed p-values corresponding to the t ratio based on a Student t distribution. (It is possible that the dispersion is not known and there are no residual degrees of freedom from which to estimate it. In that case the estimate is 'NaN'.)
aica version of Akaike's An Information Criterion value.
null.deviancethe deviance for the null model, comparable with deviance.
res.devianceup to a constant, minus twice the maximized log-likelihood.
dispersionFor binomial and Poison families the dispersion is fixed at one and the number of parameters is the number of coefficients. For gaussian, Gamma and inverse gaussian families the dispersion is estimated from the residual deviance, and the number of parameters is the number of coefficients plus one. For a gaussian family the MLE of the dispersion is used so this is a valid value of AIC, but for Gamma and inverse gaussian families it is not. Other families set this value to NA
.
df.nullthe residual degrees of freedom for the null model.
df.residualthe residual degrees of freedom.
convergedFALSE if the model did not converge.
totalObstotal number of observations in the model.
nOKtotal number of observations corresponding to nonzero weights.
loglikconverged model log-likelihood value.
rssresidual sum of squares.
iternumber of model iterations.
weightsvector of weights used in the model (SciDB array).
familymodel family function.
yresponse vector (SciDB array).
xmodel matrix (SciDB array).
factorsa list of factor variable levels (SciDB arrays) or NULL if no factors are present in the data.

code

model_scidb

itemize

formula

item

model
response
names
intercept
factors

Details

The glm_scidb function works similarly to a limited version of the usual glm function, but with a scidbdf data.frame-like SciDB array instead of a standard data.frame.

Formulas in the glm_scidb function may only refer to variables present in the data scidbdf object. And the indicated response must refer to a single-column response term in the data (the two-column response form is not accepted). Formulas may only list variables explicitly defined in the data. That means that you should bind interaction and transformed terms to your data before invoking the function.

Categorical (factor) variables in the data must be represented as strings. They will be encoded as treatment-style contrast variables with the first listed value set to the baseline value. No other automated contrast encodings are available yet (you are free to build your own model matrix and use glm.fit for that). All other variables will be coerced to double-precision values.

Use the model_scidb function to build a model matrix from a formula and a scidbdf data frame-like SciDB array. The matrix is returned within an output list as a sparse SciDB matrix of class scidb with character string variables encoded as treatment contrasts as described above. If you already have a list of factor-level codes for categorical variables (for example from the output of glm_scidb, you can supply that in the factor argument. See help for predict for an example.

Examples

Run this code

# Using glm.fit
x <- as.scidb(matrix(rnorm(5000*20),nrow=5000))
y <- as.scidb(rnorm(5000))
M <- glm.fit(x, y)
coef(M)[]

# Using glm_scidb (similar to glm)
# From the 'glm' help:
## Dobson (1990) Page 93: Randomized Controlled Trial :
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
d.AD <- data.frame(treatment, outcome, counts)
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson(),data=d.AD)
summary(glm.D93)

# Compare with:
d.AD_sci = as.scidb(d.AD)
glm.D93_sci = glm_scidb(counts ~ outcome + treatment, family = poisson(), data=d.AD_sci)
summary(glm.D93_sci)

Run the code above in your browser using DataLab