cpi_covmat: Estimate multiple change points among high-dimensional covariance matrices

Description

This function estimates multiple change points among covariance matrices for high-dimensional longitudinal/functional data. The change points are identified using the testing procedure and binary segmentation approach proposed in Zhong, Li, and Santo (2019), and Santo and Zhong (2020).

Usage

cpi_covmat(y, n, p, TT, alpha = 0.01, threads = 1)

Arguments

A high-dimensional longitudinal data set in the format of a three dimensional array where the first coordinate is for features, the second coordinate is for sample subjects, and the third coordinate is for time repetitions. Thus, the dimension of y is $p \times n \times TT$ where $p$ is the dimension of feature variables (data dimension), $n$ is the number of individuals (sample size), and $TT$ is the number of repetition times.

is the number of individuals (sample size).

is the dimension of feature variables (data dimension).

The number of repetition times.

alpha

The type I error of the homogeniety test. Suggested values for alpha include 0.01 (default) and 0.05.

threads

The number of threads for computing. The default value is 1. Change the number of threads to allow parallel computing.

Value

The function returns the estimated chane point(s), corresponding test statistic value(s), corresponding p-value(s), and a table that provides an identification for which time points have a homogeneous covariance structure. The output is a list.

$change_points: The estimated change points. Order is based on the algorithm's binary segmentation approach.
$teststats: The test statistic(s) corresponding to the estimated change point(s).
$pvalues: The p-value(s) corresponding to the estimated change point(s).
$covariance_id: A table that indicates which covariance matrices are homogeneous given the estimated change point(s). For example, when TT = 5, a single change point identified at time 3 implies the covariance matrices for times 1, 2, and 3 are equal, but they are different from the covariance matrices that are equal at time points 4 and 5.
$note: A comment that explains $covariance_id.

Details

The methodology and procedure are designed to estimate the locations of multiple change points among covariance matrices in high-dimensional longitudinal/functional data. The method allows data dimension much larger than the sample size and the number of repeated measurements. It can also accommodate general spatial and temporal dependence. For details about the proposed procedures, please read Zhong, Li and Santo (2019), and Santo and Zhong (2020).

References

Zhong, Li, and Santo (2019). Homogeneity tests of covariance matrices with high-dimensional longitudinal data. Biometrika, 106, 619-634

Santo and Zhong (2020). Homogeneity tests of covariance and change-points identification for high-dimensional functional data. arXiv:2005.01895

Examples

Run this code

# NOT RUN {
# A change point identification example with a change points at times 2 and 4

# Set parameters
p <- 30; n <- 10; TT <- 5
delta <- 0.85
m <- p+20; L <- 3; k0 <- 2; k1 <- 4; w <- 0.2

# Generate data
Gamma1 <- Gamma2 <- Gamma3 <- matrix(0, p, m * L)
y <- array(0, c(p, n, TT))
set.seed(928)

for (i in 1:p){
  for (j in 1:p){
    dij <- abs(i - j)

    if (dij < (p * w)){
      Gamma1[i, j] <- (dij + 1) ^ (-2)
      Gamma2[i, j] <- (dij + 1 + delta) ^ (-2)
      Gamma3[i, j] <- (dij + 1 + 2 * delta) ^ (-2)
    }
  }
}

Z <- matrix(rnorm(m * (TT + L - 1) * n), m * (TT + L - 1), n)

for (t in 1:k0){
  y[, , t] <- Gamma1 %*% Z[((t - 1) * m + 1):((t + L - 1) * m), ]
}
for (t in (k0 + 1):k1){
  y[, , t] <- Gamma2 %*% Z[((t - 1) * m + 1):((t + L - 1) * m), ]
}
for (t in (k1 + 1):TT){
  y[, , t] <- Gamma3 %*% Z[((t - 1) * m + 1):((t + L - 1) * m), ]
}

cpi_covmat(y, n, p, TT, alpha = 0.01)
# }

Run the code above in your browser using DataLab