fpca.lfda: Longitudinal Functional Data Analysis using FPCA

Description

Implements longitudinal functional data analysis (Park and Staicu, 2015). It decomposes longitudinally-observed functional observations in two steps. It first applies FPCA on a properly defined marginal covariance function and obtain estimated scores (mFPCA step). Then it further models the underlying process dynamics by applying another FPCA on a covariance of the estimated scores obtained in the mFPCA step. The function also allows to use a random effects model to study the underlying process dynamics instead of a KL expansion model in the second step. Scores in mFPCA step are estimated using numerical integration. Scores in sFPCA step are estimated under a mixed model framework.

Usage

fpca.lfda(Y, subject.index, visit.index, obsT = NULL, funcArg = NULL,
  numTEvalPoints = 41, newdata = NULL, fbps.knots = c(5, 10),
  fbps.p = 3, fbps.m = 2, mFPCA.pve = 0.95, mFPCA.knots = 35,
  mFPCA.p = 3, mFPCA.m = 2, mFPCA.npc = NULL,
  LongiModel.method = c("fpca.sc", "lme"), sFPCA.pve = 0.95,
  sFPCA.nbasis = 10, sFPCA.npc = NULL, gam.method = "REML", gam.kT = 10)

Arguments

a matrix of which each row corresponds to one curve observed on a regular and dense grid (dimension of N by m; N = total number of observed functions; m = number of grid points)

subject.index

subject id; vector of length N with each element corresponding a row of Y

visit.index

index for visits (repeated measures); vector of length N with each element corresponding a row of Y

obsT

actual time of visits at which a function is observed; vector of length N with each element corresponding a row of Y

funcArg

numeric; function argument

numTEvalPoints

total number of evaluation time points for visits; used for pre-binning in sFPCA step; defaults to 41

newdata

an optional data frame providing predictors (i for subject id / Ltime for visit time) with which prediction is desired; defaults to NULL

fbps.knots

list of two vectors of knots or number of equidistanct knots for all dimensions for a fast bivariate P-spline smoothing (fbps) method used to estimate a bivariate, smooth mean function; defauls to c(5,10); see fbps

fbps.p

integer;degrees of B-spline functions to use for a fbps method; defaults to 3; see fbps

fbps.m

integer;order of differencing penalty to use for a fbps method; defaults to 2; see fbps

mFPCA.pve

proportion of variance explained for a mFPCA step; used to choose the number of principal components (PCs); defaults to 0.95; see fpca.face

mFPCA.knots

number of knots to use or the vectors of knots in a mFPCA step; used for obtain a smooth estimate of a covarance function; defaults to 35; see fpca.face

mFPCA.p

integer; the degree of B-spline functions to use in a mFPCA step; defaults to 3; see fpca.face

mFPCA.m

integer;order of differencing penalty to use in a mFPCA step; defaults to 2; see fpca.face

mFPCA.npc

pre-specified value for the number of principal components; if given, it overrides pve; defaults to NULL; see fpca.face

LongiModel.method

model and estimation method for estimating covariance of estimated scores from a mFPCA step; either KL expansion model or random effects model; defaults to fpca.sc

sFPCA.pve

proportion of variance explained for sFPCA step; used to choose the number of principal components; defaults to 0.95; see fpca.sc

sFPCA.nbasis

number of B-spline basis functions used in sFPCA step for estimation of the mean function and bivariate smoothing of the covariance surface; defaults to 10; see fpca.sc

sFPCA.npc

pre-specified value for the number of principal components; if given, it overrides pve; defaults to NULL; see fpca.sc

gam.method

smoothing parameter estimation method when gam is used for predicting score functions at unobserved visit time, T; defaults to REML; see gam

gam.kT

dimension of basis functions to use; see gam

Value

A list with components
obsDataobserved data (input)
isubject id
funcArgfunction argument
visitTimevisit times
fitted.valuesfitted values (in-sample); of the same dimension as Y
fitted.values.alla list of which each component consists of a subject's fitted values at all pairs of evaluation points (s and T)
predicted.valuespredicted values for variables provided in newdata
bivariateSmoothMeanFuncestimated bivariate smooth mean function
mFPCA.efunctionsestimated eigenfunction in a mFPCA step
mFPCA.evaluesestimated eigenvalues in a mFPCA step
mFPCA.npcnumber of principal components selected with pre-specified pve in a mFPCA step
mFPCA.scree.evalestiamted eigenvalues obtained with pre-specified pve = 0.9999; for scree plot
sFPCA.xiHat.bySubja list of which each component consists of a subject's predicted score functions evaluated at equidistanced grid in direction of visit time, T
sFPCA.npca vector of numbers of principal components selected in a sFPCA step with pre-specified pve; length of mFPCA.npc
mFPCA.covarestimated marginal covariance
sFPCA.longDynCov.ka list of estimated covariance of score function; length of mFPCA.npc

Details

A random effects model is recommended when a set of visit times for all subjects and visits is not dense in its range.

References

Park, S.Y. and Staicu, A.M. (2015). Longitudinal functional data analysis. Stat 4 212-226.

Examples

Run this code

########################################
  ###   Illustration with real data    ###
  ########################################

  data(DTI)
  MS <- subset(DTI, case ==1)  # subset data with multiple sclerosis (MS) case

  index.na <- which(is.na(MS$cca))
  Y <- MS$cca; Y[index.na] <- fpca.sc(Y)$Yhat[index.na]; sum(is.na(Y))
  id <- MS$ID
  visit.index <- MS$visit
  visit.time <- MS$visit.time/max(MS$visit.time)

  lfpca.dti <- fpca.lfda(Y = Y, subject.index = id,
                         visit.index = visit.index, obsT = visit.time,
                         LongiModel.method = 'lme',
                         mFPCA.pve = 0.95)

  TT <- seq(0,1,length.out=41); ss = seq(0,1,length.out=93)

  # estimated mean function
  persp(x = ss, y = TT, z = t(lfpca.dti$bivariateSmoothMeanFunc),
        xlab="s", ylab="visit times", zlab="estimated mean fn", col='light blue')

  # first three estimated marginal eigenfunctions
  matplot(ss, lfpca.dti$mFPCA.efunctions[,1:3], type='l', xlab='s', ylab='estimated eigen fn')

  # predicted scores function corresponding to first two marginal PCs
  matplot(TT, do.call(cbind, lapply(lfpca.dti$sFPCA.xiHat.bySubj, function(a) a[,1])),
          xlab="visit time (T)", ylab="xi_hat(T)", main = "k = 1", type='l')
  matplot(TT, do.call(cbind, lapply(lfpca.dti$sFPCA.xiHat.bySubj, function(a) a[,2])),
          xlab="visit time (T)", ylab="xi_hat(T)", main = "k = 2", type='l')

  # prediction of cca of first two subjects at T = 0, 0.5 and 1 (black, red, green)
  matplot(ss, t(lfpca.dti$fitted.values.all[[1]][c(1,21,41),]),
         type='l', lty = 1, ylab="", xlab="s", main = "Subject = 1")
  matplot(ss, t(lfpca.dti$fitted.values.all[[2]][c(1,21,41),]),
         type='l', lty = 1, ylab="", xlab="s", main = "Subject = 2")

  ########################################
  ### Illustration with simulated data ###
  ########################################

  ###########################################################################################
  # data generation
  ###########################################################################################
  set.seed(1)
  n <- 100 # number of subjects
  ss <- seq(0,1,length.out=101)
  TT <- seq(0, 1, length.out=41)
  mi <- runif(n, min=6, max=15)
  ij <- sapply(mi, function(a) sort(sample(1:41, size=a, replace=FALSE)))

  # error variances
  sigma <- 0.1
  sigma_wn <- 0.2

  lambdaTrue <- c(1,0.5) # True eigenvalues
  eta1True <- c(0.5, 0.5^2, 0.5^3) # True eigenvalues
  eta2True <- c(0.5^2, 0.5^3) # True eigenvalues

  phi <- sqrt(2)*cbind(sin(2*pi*ss),cos(2*pi*ss))
  psi1 <- cbind(rep(1,length(TT)), sqrt(3)*(2*TT-1), sqrt(5)*(6*TT^2-6*TT+1))
  psi2 <- sqrt(2)*cbind(sin(2*pi*TT),cos(2*pi*TT))

  zeta1 <- sapply(eta1True, function(a) rnorm(n = n, mean = 0, sd = a))
  zeta2 <- sapply(eta2True, function(a) rnorm(n = n, mean = 0, sd = a))

  xi1 <- unlist(lapply(1:n, function(a) (zeta1 %*% t(psi1))[a,ij[[a]]] ))
  xi2 <- unlist(lapply(1:n, function(a) (zeta2 %*% t(psi2))[a,ij[[a]]] ))
  xi <- cbind(xi1, xi2)

  Tij <- unlist(lapply(1:n, function(i) TT[ij[[i]]] ))
  i <- unlist(lapply(1:n, function(i) rep(i, length(ij[[i]]))))
  j <- unlist(lapply(1:n, function(i) 1:length(ij[[i]])))

  X <- xi %*% t(phi)
  meanFn <- function(s,t){ 0.5*t + 1.5*s + 1.3*s*t}
  mu <- matrix(meanFn(s = rep(ss, each=length(Tij)), t=rep(Tij, length(ss)) ) , nrow=nrow(X))

  Y <- mu +  X +
     matrix(rnorm(nrow(X)*ncol(phi), 0, sigma), nrow=nrow(X)) %*% t(phi) + #correlated error
     matrix(rnorm(length(X), 0, sigma_wn), nrow=nrow(X)) # white noise

  matplot(ss, t(Y[which(i==2),]), type='l', ylab="", xlab="functional argument",
         main="observations from subject i = 2")
  # END: data generation

  ###########################################################################################
  # Illustration I : when covariance of scores from a mFPCA step is estimated using fpca.sc
  ###########################################################################################
  est <- fpca.lfda(Y = Y,
                   subject.index = i, visit.index = j, obsT = Tij,
                   funcArg = ss, numTEvalPoints = length(TT),
                   newdata = data.frame(i = c(1:3), Ltime = c(Tij[1], 0.2, 0.5)),
                   fbps.knots = 35, fbps.p = 3, fbps.m = 2,
                   LongiModel.method='fpca.sc',
                   mFPCA.pve = 0.95, mFPCA.knots = 35, mFPCA.p = 3, mFPCA.m = 2,
                   sFPCA.pve = 0.95, sFPCA.nbasis = 10, sFPCA.npc = NULL,
                   gam.method = 'REML', gam.kT = 10)


  # mean function (true vs. estimated)
  par(mfrow=c(1,2))
  persp(x=TT, y = ss, z= t(sapply(TT, function(a) meanFn(s=ss, t = a))),
          xlab="visit times", ylab="s", zlab="true mean fn")
  persp(x = TT, y = ss, est$bivariateSmoothMeanFunc,
   xlab="visit times", ylab="s", zlab="estimated mean fn", col='light blue')

  ################   mFPCA step   ################
  par(mfrow=c(1,2))

  # marginal covariance fn (true vs. estimated)
  image(phi%*%diag(lambdaTrue)%*%t(phi))
  image(est$mFPCA.covar)

  # eigenfunctions (true vs. estimated)
  matplot(ss, phi, type='l')
  matlines(ss, cbind(est$mFPCA.efunctions[,1], est$mFPCA.efunctions[,2]), type='l', lwd=2)

  # scree plot
  plot(cumsum(est$mFPCA.scree.eval)/sum(est$mFPCA.scree.eval), type='l',
       ylab = "Percentage of variance explained")
  points(cumsum(est$mFPCA.scree.eval)/sum(est$mFPCA.scree.eval), pch=16)

  ################   sFPCA step   ################
  par(mfrow=c(1,2))
  print(est$mFPCA.npc)  # k = 2

  # covariance of score functions for k = 1 (true vs. estimated)
  image(psi1%*%diag(eta1True)%*%t(psi1), main='TRUE')
  image(est$sFPCA.longDynCov.k[[1]], main='ESTIMATED')

  # covariance of score functions for k = 2 (true vs. estimated)
  image(psi2%*%diag(eta2True)%*%t(psi2))
  image(est$sFPCA.longDynCov.k[[2]])

  # estimated scores functions
  matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,1])),
          xlab="visit time", main="k=1", type='l', ylab="", col=rainbow(100, alpha = 1),
          lwd=1, lty=1)
  matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,2])),
          xlab="visit time", main="k=2",type='l', ylab="", col=rainbow(100, alpha = 1),
          lwd=1, lty=1)

  ################   In-sample and Out-of-sample Prediction   ################
  par(mfrow=c(1,2))
  # fitted
  matplot(ss, t(Y[which(i==1),]), type='l', ylab="", xlab="functional argument")
  matlines(ss, t(est$fitted.values[which(i==1),]), type='l', lwd=2)

 # sanity check : expect fitted and predicted (obtained using info from newdata)
 #                values to be the same

  plot(ss, est$fitted.values[1,], type='p', xlab="", ylab="", pch = 1, cex=1)
  lines(ss, est$predicted.values[1,], type='l', lwd=2, col='blue')
  all.equal(est$predicted.values[1,], est$fitted.values[1,])

  ###########################################################################################
  # Illustration II : when covariance of scores from a mFPCA step is estimated using lmer
  ###########################################################################################
  est.lme <- fpca.lfda(Y = Y,
                       subject.index = i, visit.index = j, obsT = Tij,
                       funcArg = ss, numTEvalPoints = length(TT),
                       newdata = data.frame(i = c(1:3), Ltime = c(Tij[1], 0.2, 0.5)),
                       fbps.knots = 35, fbps.p = 3, fbps.m = 2,
                       LongiModel.method='lme',
                       mFPCA.pve = 0.95, mFPCA.knots = 35, mFPCA.p = 3, mFPCA.m = 2,
                       gam.method = 'REML', gam.kT = 10)

  par(mfrow=c(2,2))

  # fpca.sc vs. lme (assumes linearity)
  matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,1])),
          xlab="visit time", main="k=1", type='l', ylab="", col=rainbow(100, alpha = 1),
          lwd=1, lty=1)
  matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,2])),
          xlab="visit time", main="k=2",type='l', ylab="", col=rainbow(100, alpha = 1),
          lwd=1, lty=1)

  matplot(TT, do.call(cbind,lapply(est.lme$sFPCA.xiHat.bySubj, function(a) a[,1])),
          xlab="visit time", main="k=1", type='l', ylab="", col=rainbow(100, alpha = 1),
          lwd=1, lty=1)
  matplot(TT, do.call(cbind,lapply(est.lme$sFPCA.xiHat.bySubj, function(a) a[,2])),
          xlab="visit time", main="k=2", type='l', ylab="", col=rainbow(100, alpha = 1),
          lwd=1, lty=1)

Run the code above in your browser using DataLab