face.sparse: Fast covariance estimation for sparse functional data

Description

The function is to estimate the mean and covariance function from a cluster of functions/longitudinal observations.

Usage

face.sparse(data, newdata = NULL, center = TRUE, argvals.new = NULL, knots = 10, knots.option="quantile", p = 3, m = 2, lambda = NULL, lambda_mean = NULL, search.length = 14,  lower = -3, upper = 10, lower2 = -1, upper2 = 5, calculate.scores = FALSE,pve=0.99)

Arguments

data

a data frame with three arguments: (1) argvals: observation times; (2) subj: subject indices; (3) y: values of observations. Missing values not allowed.

newdata

of the same strucutre as data; defaults to NULL, then no prediction.

center

logical. If TRUE, then Pspline smoothing of the population mean will be conducted and subtracted from the data before covariance smoothing; if FALSE, then the population mean will be just 0s.

argvals.new

a vector of observation time points to evaluate mean function, covariance function, error variance and etc. If NULL, then 100 equidistant points in the range of data$argvals.

knots

a vector of interior knots or the number of knots for B-spline basis functions to be used; defaults to 10.

knots.option

if knots specifies the number of knots, then knots.option will be used. Default "quantile": quantiles of the observed time points will be selected. Alternatively "equally-spaced", then equally-spaced knots in the range of observed time points will be selected.

the degrees of B-splines; defaults to 3.

the order of differencing penalty; defaults to 2.

lambda

the value of the smoothing parameter for covariance smoothing; defaults to NULL.

lambda_mean

the value of the smoothing parameter for mean smoothing; defaults to NULL.

search.length

the number of equidistant (log scale) smoothing parameters to search; defaults to 14.

lower, upper

bounds for log smoothing parameter for first step of estimation; defaults are -3 and 10, respectively.

lower2, upper2

bounds for log smoothing parameter for second step of estimation; defaults are lower and 5, respectively.

calculate.scores

if TRUE, scores will be calculated.

pve

Defaults 0.99. To select the number of eigenvalues by percentage of variance.

Value

newdata: Input
y.pred,mu.pred,Chat.diag.pred, var.error.pred: Predicted/estimated objects at newdata$argvals
Theta: Estimated parameter matrix
argvals.new: Vector of time points to evaluate population parameters
mu.new, Chat.new,Cor.new,Cor.raw.new,Chat.raw.diag.new, var.error.new: Estimated objects at argvals.new
eigenfunctions, eigenvalues: Estimated eigenfunctions (scaled eigenvector) and eigenvalues at argvals.new
mu.hat,var.error.hat: Estimated objects at data$argvals
calculate.scores,scores: if calculate.scores is TRUE (default to FALSE) and newdata is not NULL, then predicted scores will be calculated.
...: ...

Details

This is a generalized version of bivariate P-splines (Eilers and Marx, 2003) for covariance smoothing of sparse functional or longitudinal data. It uses tensor product B-spline basis functions and employes a differencing penalty on the assosciated parameter matrix. The only smoothing parameter in the method is selected by leave-one-subject-out cross validation and is implemented with a fast algorithm.

There are two steps for estimation. During the first step, the objective function to minimize is the least squares on empirical estimates of covariance function. During the second step, the covariance between the empirical estimates (depending on the estimates of covariance function) are accounted and thus a generalized least squares are minimized.

If center is TRUE, then a population mean will be calculated and is smoothed by univariate P-spline smoothing:pspline (Eilers and Marx, 1996). This univariate smoothing uses leave-one-subject-out cross validation to select the smoothing parameter.

If the functional data are observed at the same grid for each function/curve and can be organized into a data matrix, then fpca.face in the package refund should instead be used. fpca.face allows a small percentage (less than 30 percent) of missing data in the data matrix.

References

Luo Xiao, Cai Li, Will Checkley and Ciprian Crainiceanu, Fast covariance estimation for sparse functional data, manuscript.

Paul Eilers and Brian Marx, Multivariate calibration with temperature interaction using two-dimensional penalized signal regression, Chemometrics and Intelligent Laboratory Systems 66 (2003), 159-174.

Paul Eilers and Brian Marx, Flexible smoothing with B-splines and penalties, Statist. Sci., 11, 89-121, 1996.

Examples

Run this code


## Not run: 
# ##########################
# #### CD4 data example
# ##########################
# 
# require(refund)
# data(cd4)
# n <- nrow(cd4)
# T <- ncol(cd4)
# 
# id <- rep(1:n,each=T)
# t <- rep(-18:42,times=n)
# y <- as.vector(t(cd4))
# sel <- which(is.na(y))
# 
# 
# ## organize data and apply FACEs
# data <- data.frame(y=log(y[-sel]),
# argvals = t[-sel],
# subj = id[-sel])
# data <- data[data$y>4.5,]
# fit_face <- face.sparse(data,argvals.new=(-20:40))
# 
# data.h <- data
# tnew <- fit_face$argvals.new
# 
# ## scatter plots
# Xlab <- "Months since seroconversion"
# Ylab <- "log (CD4 count)"
# par(mfrow=c(1,1),mar = c(4.5,4.5,3,2))
# id <- data.h$subj
# uid <- unique(id)
# plot(data.h$argvals,data.h$y,
# type = "n", ylim = c(4.5,8),
# xlab = Xlab, ylab = Ylab,
# cex.lab = 1.25,cex.axis=1.25,cex.main = 1.25)
# 
# for(i in 1:10){
# seq <- which(id==uid[i])
# lines(data.h$argvals[seq],data.h$y[seq],lty=1,col="gray",lwd=1,type="l")
# #points(data.h$argvals[seq],data.h$y[seq],col=1,lty=1,pch=1)
# }
# 
# Sample <- seq(10,50,by=10)
# for(i in Sample){
# seq <- which(id==uid[i])
# lines(data.h$argvals[seq],data.h$y[seq],lty=1,col="black",lwd=1,type="l")
# }
# lines(tnew,fit_face$mu.new,lwd=2,lty=2,col="red")
# 
# ## plots of variance/correlation functions
# 
# Cov <- fit_face$Chat.new
# Cov_diag <- diag(Cov)
# Cor <- fit_face$Cor.new
# 
# par(mfrow=c(1,2),mar=c(4.5,4.1,3,4.5))
# 
# 
# plot(tnew,Cov_diag,type="l",
# xlab = Xlab, ylab="",main= "CD4: variance function",
# #ylim = c(0.8,1.5),
# cex.axis=1.25,cex.lab=1.25,cex.main=1.25,lwd=2)
# 
# require(fields)
# image.plot(tnew,tnew,Cor,
# xlab=Xlab, ylab = Xlab,
# main = "CD4: correlation function",
# cex.axis=1.25,cex.lab=1.25,cex.main=1.25,
# axis.args = list(at = c(0,0.2,0.4,0.6,0.8,1.0)),
# legend.shrink=0.75,legend.line=-1.5)
# 
# 
# ## prediction of several subjects
# 
# par(mfrow=c(2,2),mar=c(4.5,4.5,3,2))
# Sample <- c(30,40,50,60)
# for(i in 1:4){
# sel <- which(id==uid[Sample[i]])
# dati <- data.h[sel,]
# 
# seq <- -20:40
# k <- length(seq)
# dati_pred <- data.frame(y = rep(NA,nrow(dati) + k ),
# argvals = c(rep(NA,nrow(dati)),seq),
# subj=rep(dati$subj[1],nrow(dati) + k )
# )
# 
# dati_pred[1:nrow(dati),] <- dati
# yhat2 <- predict(fit_face,dati_pred)
# 
# data3 <- dati
# Ylim <- range(c(data3$y,yhat2$y.pred))
# 
# plot(data3$argvals,data3$y,xlab=Xlab,ylab=Ylab, main = paste("Male ",i,sep=""),
# ylim = c(4,8.5),
# cex.lab=1.25,cex.axis = 1.25,cex.main = 1.25,pch=1,xlim=c(-20,40))
# 
# Ord <- nrow(dati) + 1:k
# lines(dati_pred$argvals[Ord],yhat2$y.pred[Ord],col="red",lwd=2)
# lines(dati_pred$argvals[Ord],
# yhat2$y.pred[Ord] - 1.96*yhat2$se.pred[Ord], col="red",lwd=1,lty=2)
# lines(dati_pred$argvals[Ord],
# yhat2$y.pred[Ord] + 1.96*yhat2$se.pred[Ord], col="red",lwd=1,lty=2)
# 
# lines(tnew,fit_face$mu.new,lty=3,col="black",lwd=2)
# legend("bottomleft",c("mean","prediction"),lty=c(3,1),col=1:2,lwd=2,bty="n")
# }
# ## End(Not run)

Run the code above in your browser using DataLab