The function Extract_Eigencomp_fDA()
computes the eigenfunctions and the
covariance of the shrinkage scores required to conduct
the projection-based test of mean function between two groups of longitudinal data
or sparsely observed functional data under a random irregular design, as developed by Wang (2021).
Extract_Eigencomp_fDA(
nobs_per_subj,
obs.design,
mean_diff_fnm,
cov.type = c("ST", "NS"),
cov.par,
sigma2.e,
missing_type = c("nomiss", "constant"),
missing_percent = 0,
eval_SS = 5000,
alloc.ratio = c(1, 1),
fpca_method = c("fpca.sc", "face"),
work.grid = NULL,
nWgrid = ifelse(is.null(work.grid), 101, length(work.grid)),
data.driven.scores = FALSE,
mean_diff_add_args = list(),
fpca_optns = list()
)
A list with the elements listed below.
mean_diff_vec
- The evaluation of the mean function at the working grid.
est_eigenfun
- The evaluation of the estimated eigenfunctions at the working grid.
est_eigenval
- Estimated eigen values.
working.grid
- The grid points at which mean_diff_vec
and
est_eigenfun
are evaluated.
fpcCall
- The exact call of either of the fpca_sc()
or face::face.sparse()
used to compute the eigencomponents.
scores_var1
- Estimated covariance of the shrinkage
scores for the treatment group.
scores_var2
- Estimated covariance of the shrinkage
scores for the placebo group.
pooled_var
- Pooled covariance of the scores combining both the groups. This is required
if the user wants to compute the power of Hotelling T statistic under equal variance assumption.
If data.driven.scores == TRUE
additional components are returned
scores_1
- Estimated shrinkage
scores for all the subjects in treatment group.
scores_2
- Estimated shrinkage
scores for all the subjects in placebo group.
The output of this function is designed such a way the
user can directly input the output obtained from this function into the arguments of
Power_Proj_Test_ufDA()
function to obtain the power and the sample size right away. The function
PASS_Proj_Test_ufDA does the same, it is essentially a wrapper ofExtract_Eigencomp_fDA()
and Power_Proj_Test_ufDA()
together.
The number of observations per subject. Each element of it must be greater than 3. It could also be a vector to indicate that the number of observation for each is randomly varying between the elements of the vector, or a scalar to ensure that the number of observations are same for each subject. See examples.
The sampling design of the observations. Must be provided as
a list with the following elements. If the design is longitudinal (e.g. a clinical trial
where there is pre-specified schedule of visit for the participants) it must be
a named list with elements design
, visit.schedule
and visit.window
, where
obs.design$design
must be specified as 'longitudinal'
, visit.schedule
specifying schedule of visits (in months or days or any unit of time), other than the baseline visit
and visit.window
denoting the maximum time window for every visit.
For functional design (where the observation points are either densely observed within a
compact interval or under a sparse random design), the argument must be provided
as a named list with elements design
and fun.domain
, where
obs.design$design
must be specified as 'functional'
and obs.design$fun.domain
must be specified as a two length vector indicating the domain of the function.
See Details on the specification of arguments section below more details.
The name of the function that output of the difference of the mean between the
two groups at any given time. It must be supplied as character, so that match.fun(mean_diff_fnm)
returns a valid function, that takes a vector input, and returns a vector of the same length of the input.
The type of the covariance structure of the data, must be either of 'ST' (stationary) or
'NS' (non-stationary). This argument along with the cov.par
argument must be
specified compatibly to ensure that the function does not return an error. See the details
of cov.par
argument.
The covariance structure of the latent response trajectory.
If cov.type == 'ST'
then, cov.par
must be specified a named list of two elements, var
and cor
,
where var
is the common variance of the observations, which must be a
positive number; and cor
specifies the correlation structure between
the observations. cov.par$cor
must be specified in the form of the
nlme::corClasses specified in R package nlme.
Check the package documentation for more details for each of the correlation classes.
The cov.par$cor
must be a corStruct
class so it can be
passed onto the nlme::corMatrix()
to extract the subject-specific covariance matrix.
If cov.type='NS'
then, cov.par
must be a named list of two elements, cov.obj
and eigen.comp
,
where only one of the cov.par$cov.obj
or cov.par$eigen.comp
must be non-null. This is to specify that the covariance structure of the
latent trajectory can be either provided in the form of covariance function or
in the form of eigenfunction and eigenvalues (Spectral decomposition).
If the cov.par$cov.obj
is specified, then it must be a bivariate function,
with two arguments. Alternatively, if the true eigenfunctions are known,
then the user can specify that by specifying cov.par$eigen.comp
.
In this case, the cov.par$eigen.comp
must be a named list with two elements,
eig.obj
and eig.val
, where cov.par$eigen.comp$eig.val
must be positive vector and cov.par$eigen.comp$eig.obj
must be a vectorized function so that its evaluation at a vector of time points
returns a matrix of dimension r by length(cov.par$eigen.comp$eig.val)
,
with r being the length of time points.
Measurement error variance, should be set as zero or a very small number if the measurement error is not significant.
The type of missing in the number of observations of the subjects. Can be one of
'nomiss'
for no missing observations
or 'constant'
for constant
missing percentage at every time point. The current version of package only supports
missing_type = 'constant'
.
The percentage of missing at each observation points for each subject.
Must be supplied as number between [0, 0.8], as missing percentage more than 80% is not practical.
If nobs_per_subj
is supplied as vector, then missing_type
is forced to set as 'nomiss'
and missing_percent = 0
, because
the missing_type = 'constant'
has no meaning if the number of observations are
varying between the subject at the first, typically considered in
the case of sparse random functional design.
The sample size based on which the eigencomponents will be estimated from data. To compute the theoretical power of the test we must make sure that we use a large enough sample size to generate the data such that the estimated eigenfunctions are very close to the true eigenfunctions and that the sampling design will not have much effect on the loss of precision. Default value 5000.
The allocation ratio of samples in the each group. Note that the eigenfunctions
will still be estimated based on the total sample_size, however, the variance
of the shrinkage
scores (which is required to compute the power function) will be
estimated based on the allocation of the samples in each group. Must be given as vector of
length 2. Default value is set at c(1, 1)
, indicating equal sample size.
The method by which the FPCA is computed. Must be one of
'fpca.sc' and 'face'. If fpca_method == 'fpca.sc'
then the eigencomponents
are estimated using the function refund::fpca.sc()
. However, since the refund::fpca.sc()
function fails to estimate the correct shrinkage
scores, and throws NA
values
when the measurement errors is estimated to be zero, we wrote out a similar function
where we corrected those error in current version of refund::fpca.sc()
. Check out
the fpca_sc()
function for details. If fpca_method == 'face'
, then
the eigencomponents are estimated using face::face.sparse()
function.
The working grid in the domain of the functions, where the eigenfunctions
and other covariance components will be estimated. Default is NULL, then, a equidistant
grid points of length nWgrid
will be internally created to as the default work.grid
.
The length of the work.grid
in the domain of the function based on which
the eigenfunctions will be estimated. Default value is 101. If work.grid
is specified, then nWgrid
must be null, and vice-versa.
Indicates whether the scores are estimated from the full data, WITHOUT
assuming the mean function is unknown, rather the mean function is estimated using
mgcv::gam()
function.
Additional arguments to be passed to group difference
function specified in the argument mean_diff_fnm
.
Additional options to be passed onto either of fpca_sc()
or face::face.sparse()
function in order
to estimate the eigencomponents. It must be a named list with elements
to be passed onto the respective function, depending on the fpca_method
.
The names of the list must not match either of
c('data', 'newdata', 'argvals.new')
for fpca_method == 'face'
and must not match either of
c('ydata', 'Y.pred')
for fpca_method == 'fpca.sc'
.
If obs.design$design == 'functional'
then a dense grid of length,
specified by ngrid (typically 101/201) is internally created, and
the observation points will be randomly chosen from them.
The time points could also randomly chosen between
any number between the interval, but then for large number of subject,
fpca_sc()
function will take huge
time to estimate the eigenfunction. For dense design, the user must set
a large value of the argument nobs_per_subj
and for sparse (random) design,
nobs_per_subj
should be set small (and varying).
On the other hand, typical to longitudinal data, if the measurements are
taken at fixed time points (from baseline)
for each subject, then the user must set obs.design$design == 'longitudinal'
and
the time points must be accordingly specified
in the argument obs.design$visit.schedule
. The length of obs.design$visit.schedule
must match length(nobs_per_subj)-1
. Internally, when
obs.design$design == 'longitudinal'
, the function scale the visit times
so that it lies between [0, 1], so the user should not
specify any element named fun.domain
in the
list for obs.design$design == 'longitudinal'
. Make sure that
the mean function and the covariance function specified
in the cov.par
and mean_diff_fnm
parameter also scaled to
take argument between [0, 1]. Also, it is imperative to say that nobs_per_subj
must
be of a scalar positive integer for design == 'longitudinal'
.
Salil Koner
Maintainer: Salil Koner
salil.koner@duke.edu
The function can handle data from wide variety of covariance structure, can be parametric,
or non-parametric. Additional with traditional stationary structures assumed for longitudinal
data (see nlme::corClasses), the user can specify any other non-stationary covariance
function in the form of either a covariance function or in terms of eigenfunctions and
eigenvalues. The user have a lot of flexibility into tweaking the arguments nobs_per_subject
,
obs.design
, and cov.par
to compute the eigencomponents
under different sampling design and covariance process of the response trajectory, and
for any arbitrary mean difference function. Internally, using the sampling
design and the covariance structure specified, we generate a large data with
large number of subjects, and estimate the eigenfunctions and the covariance of the estimated
shrinkage
scores by means of functional principal component analysis (fPCA). We put the option of using
two most commonly used softwares for fPCA in the functional data literature, refund::fpca.sc()
and face::face.sparse()
. However, since the refund::fpca.sc()
do not compute the shrinkage
scores correctly, especially when the measurement error variance is estimated to be zero,
we made a duplicate version of that function in our package, where we write out
the scoring part on our own. The new function is named as fpca_sc()
, please check it out.
Wang, Qiyao (2021)
Two-sample inference for sparse functional data, Electronic Journal of Statistics,
Vol. 15, 1395-1423
tools:::Rd_expr_doi("https://doi.org/10.1214/21-EJS1802").
See Power_Proj_Test_ufDA()
, refund::fpca.sc()
and face::face.sparse()
.
# Example 1: Extract eigencomponents from stationary covariance.
set.seed(12345)
mean.diff <- function(t) {t};
obs.design <- list("design" = "longitudinal",
"visit.schedule" = seq(0.1, 0.9, length.out=7),
"visit.window" = 0.05)
cor.str <- nlme::corExp(1, form = ~ time | Subject);
sigma2 <- 1; sigma2.e <- 0.25; nobs_per_subj <- 8;
missing_type <- "constant"; missing_percent <- 0.01;
eigencomp <- Extract_Eigencomp_fDA(obs.design = obs.design,
mean_diff_fnm = "mean.diff", cov.type = "ST",
cov.par = list("var" = sigma2, "cor" = cor.str),
sigma2.e = sigma2.e, nobs_per_subj = nobs_per_subj,
missing_type = missing_type,
missing_percent = missing_percent, eval_SS = 1000,
alloc.ratio = c(1,1), nWgrid = 201,
fpca_method = "fpca.sc", data.driven.scores = FALSE,
mean_diff_add_args = list(), fpca_optns = list(pve = 0.95))
# Example 2: Extract eigencomponents from non-stationary covariance.
alloc.ratio <- c(1,1)
mean.diff <- function(t) {1 * (t^3)};
eig.fun <- function(t, k) { if (k==1) {
ef <- sqrt(2)*sin(2*pi*t)
} else if (k==2) {ef <- sqrt(2)*cos(2*pi*t)}
return(ef)}
eig.fun.vec <- function(t){cbind(eig.fun(t, 1),eig.fun(t, 2))}
eigen.comp <- list("eig.val" = c(1, 0.5), "eig.obj" = eig.fun.vec)
obs.design <- list(design = "functional", fun.domain = c(0,1))
cov.par <- list("cov.obj" = NULL, "eigen.comp" = eigen.comp)
sigma2.e <- 0.001; nobs_per_subj <- 4:7;
missing_type <- "nomiss"; missing_percent <- 0;
fpca_method <- "fpca.sc"
eigencomp <- Extract_Eigencomp_fDA(obs.design = obs.design,
mean_diff_fnm = "mean.diff",
cov.type = "NS", cov.par = cov.par,
sigma2.e = sigma2.e, nobs_per_subj = nobs_per_subj,
missing_type = missing_type,
missing_percent = missing_percent, eval_SS = 1000,
alloc.ratio = alloc.ratio, nWgrid = 201,
fpca_method = "fpca.sc", data.driven.scores = FALSE,
mean_diff_add_args = list(), fpca_optns = list(pve = 0.95))
Run the code above in your browser using DataLab