Learn R Programming

biganalytics (version 1.1.12)

biglm.big.matrix, bigglm.big.matrix: Use Thomas Lumley's ``biglm'' package with a ``big.matrix''

Description

This is a wrapper to Thomas Lumley's biglm package, allowing it to be used with massive data stored in big.matrix objects.

Usage

biglm.big.matrix( formula, data, chunksize=NULL, ..., fc=NULL,
  getNextChunkFunc=NULL)
bigglm.big.matrix( formula, data, chunksize=NULL, ..., fc=NULL,
  getNextChunkFunc=NULL)

Arguments

formula
a model formula.
data
chunksize
an integer maximum size of chunks of data to process iteratively.
fc
either column indices or names of variables that are factors.
...
options associated with the biglm
getNextChunkFunc
a function which retrieves chunk data

Value

  • an object of class biglm.

Details

See biglm package for more information; chunksize defaults to max(floor(nrow(data)/ncol(data)^2), 10000).

References

Algorithm AS274 Applied Statistics (1992) Vol. 41, No.2

Thomas Lumley (2005). biglm: bounded memory linear and generalized linear models. R package version 0.4.

See Also

biglm, big.matrix

Examples

Run this code
# This example is quite silly, using the iris
# data.  But it shows that our wrapper to Lumley's biglm() function
# produces the same answer as the plain old lm() function.

require(bigmemory)
x <- matrix(unlist(iris), ncol=5)
colnames(x) <- names(iris)
x <- as.big.matrix(x)
head(x)

silly.biglm <- biglm.big.matrix(Sepal.Length ~ Sepal.Width + Species,
                                data=x, fc="Species")
summary(silly.biglm)

y <- data.frame(x[,])
y$Species <- as.factor(y$Species)
head(y)

silly.lm <- lm(Sepal.Length ~ Sepal.Width + Species, data=y)
summary(silly.lm)

Run the code above in your browser using DataLab