Learn R Programming

mgcv (version 1.9-4)

XWXd: Internal functions for discretized model matrix handling

Description

Routines for computing with discretized model matrices as described in Wood et al. (2017) and Li and Wood (2019).

Usage

XWXd(X,w,k,ks,ts,dt,v,qc,nthreads=1,drop=NULL,ar.stop=-1,ar.row=-1,ar.w=-1,
     lt=NULL,rt=NULL)
XWyd(X,w,y,k,ks,ts,dt,v,qc,drop=NULL,ar.stop=-1,ar.row=-1,ar.w=-1,lt=NULL)
Xbd(X,beta,k,ks,ts,dt,v,qc,drop=NULL,lt=NULL)
diagXVXd(X,V,k,ks,ts,dt,v,qc,drop=NULL,nthreads=1,lt=NULL,rt=NULL)
ijXVXd(i,j,X,V,k,ks,ts,dt,v,qc,drop=NULL,nthreads=1,lt=NULL,rt=NULL)

Arguments

Details

These functions are really intended to be internal, but are exported so that they can be used in the initialization code of families without problem. They are primarily used by bam to implement the methods given in the references. XWXd produces \(X^TWX\), XWy produces \(X^TWy\), Xbd produces \(X\beta\) and diagXVXd produces the diagonal of \(XVX^T\), while ijXVXd evaluates the scattered \(i,j\) elements indexed in i and j.

The "lpip" attribute of X is a list of the coefficient indices for each term. Required if subsetting via lt and rt.

X can be a list of sparse matrices of class "dgCMatrix", in which case reverse indices are needed, mapping stored matrix rows to rows in the full matrix (that is the reverse of k which maps full matrix rows to the stored unique matrix rows). r is the same dimension as k while off is a list with as many elements as k has columns. r and off are supplied as attributes to X . For simplicity let r and off denote a single column and element corresponding to each other: then r[off[j]:(off[j+1]-1)] contains the rows of the full matrix corresponding to row j of the stored matrix. The reverse indices are essential for efficient computation with sparse matrices. See the example code for how to create them efficiently from the forward index matrix, k.

References

Wood, S.N., Li, Z., Shaddick, G. & Augustin N.H. (2017) Generalized additive models for gigadata: modelling the UK black smoke network daily data. Journal of the American Statistical Association. 112(519):1199-1210 tools:::Rd_expr_doi("10.1080/01621459.2016.1195744")

Li, Z & S.N. Wood (2019) Faster model matrix crossproducts for large generalized linear models with discretized covariates. Statistics and Computing. tools:::Rd_expr_doi("10.1007/s11222-019-09864-2")

Examples

Run this code
  library(mgcv);library(Matrix)
  ## simulate some data creating a marginal matrix sequence...
  set.seed(0);n <- 4000
  dat <- gamSim(1,n=n,dist="normal",scale=2)
  dat$x4 <- runif(n)
  dat$y <- dat$y + 3*exp(dat$x4*15-5)/(1+exp(dat$x4*15-5))
  dat$fac <- factor(sample(1:20,n,replace=TRUE))
  G <- gam(y ~ te(x0,x2,k=5,bs="bs",m=1)+s(x1)+s(x4)+s(x3,fac,bs="fs"),
           fit=FALSE,data=dat,discrete=TRUE)
  p <- ncol(G$X)
  ## create a sparse version...
  Xs <- list(); r <- G$kd*0; off <- list()
  for (i in 1:length(G$Xd)) Xs[[i]] <- as(G$Xd[[i]],"dgCMatrix")
  for (j in 1:nrow(G$ks)) { ## create the reverse indices...
    nr <- nrow(Xs[[j]]) ## make sure we always tab to final stored row 
    for (i in G$ks[j,1]:(G$ks[j,2]-1)) {
      r[,i] <- (1:length(G$kd[,i]))[order(G$kd[,i])]
      off[[i]] <- cumsum(c(1,tabulate(G$kd[,i],nbins=nr)))-1
    }
  }
  attr(Xs,"off") <- off;attr(Xs,"r") <- r 

  par(mfrow=c(2,3))

  beta <- runif(p)
  Xb0 <- Xbd(G$Xd,beta,G$kd,G$ks,G$ts,G$dt,G$v,G$qc)
  Xb1 <- Xbd(Xs,beta,G$kd,G$ks,G$ts,G$dt,G$v,G$qc)
  range(Xb0-Xb1);plot(Xb0,Xb1,pch=".")

  bb <- cbind(beta,beta+runif(p)*.3)
  Xb0 <- Xbd(G$Xd,bb,G$kd,G$ks,G$ts,G$dt,G$v,G$qc)
  Xb1 <- Xbd(Xs,bb,G$kd,G$ks,G$ts,G$dt,G$v,G$qc)
  range(Xb0-Xb1);plot(Xb0,Xb1,pch=".")

  p <- length(beta) ## extract full model matrix...
  X <- matrix(Xbd(G$Xd,diag(p),G$kd,G$ks,G$ts,G$dt,G$v,G$qc),ncol=p)

  w <- runif(n)
  XWy0 <- XWyd(G$Xd,w,y=dat$y,G$kd,G$ks,G$ts,G$dt,G$v,G$qc) 
  XWy1 <- XWyd(Xs,w,y=dat$y,G$kd,G$ks,G$ts,G$dt,G$v,G$qc)
  range(XWy1-XWy0);plot(XWy1,XWy0,pch=".")

  yy <- cbind(dat$y,dat$y+runif(n)-.5)
  XWy0 <- XWyd(G$Xd,w,y=yy,G$kd,G$ks,G$ts,G$dt,G$v,G$qc) 
  XWy1 <- XWyd(Xs,w,y=yy,G$kd,G$ks,G$ts,G$dt,G$v,G$qc)
  range(XWy1-XWy0);plot(XWy1,XWy0,pch=".")

  A <- XWXd(G$Xd,w,G$kd,G$ks,G$ts,G$dt,G$v,G$qc)
  B <- XWXd(Xs,w,G$kd,G$ks,G$ts,G$dt,G$v,G$qc)
  D <- crossprod(X,w*X) ## direct computation
  range(A-D)
  range(A-B);plot(A,B,pch=".")
  ## compute some cross product terms only...
  A <- XWXd(G$Xd,w,G$kd,G$ks,G$ts,G$dt,G$v,G$qc,lt=1:3,rt=4:5)
  range(A-D[1:nrow(A),(nrow(A)+1):ncol(D)])

  V <- crossprod(matrix(runif(p*p),p,p))
  ii <- c(20:30,100:200)
  jj <- c(50:90,150:160)
  V[ii,jj] <- 0;V[jj,ii] <- 0
  d1 <- diagXVXd(G$Xd,V,G$kd,G$ks,G$ts,G$dt,G$v,G$qc)
  Vs <- as(V,"dgCMatrix")
  d2 <- diagXVXd(Xs,Vs,G$kd,G$ks,G$ts,G$dt,G$v,G$qc)
  range(d1-d2);plot(d1,d2,pch=".")

Run the code above in your browser using DataLab