cpss.mean: Detecting changes in mean

Description

Detecting changes in mean

Usage

cpss.mean(
  dataset,
  algorithm = "BS",
  dist_min = floor(log(n)),
  ncps_max = ceiling(n^0.4),
  pelt_pen_val = NULL,
  pelt_K = 0,
  wbs_nintervals = 500,
  criterion = "CV",
  times = 2,
  Sigma = NULL
)

Value

cpss.mean returns an object of an S4 class, called "cpss", which collects data and information required for further change-point analyses and summaries. See cpss.custom.

Arguments

dataset: a numeric matrix of dimension $n\times d$, where each row represents an observation and each column stands for a variable. A numeric vector could also be acceptable for univariate observations.
algorithm: a character string specifying the change-point searching algorithm, one of four state-of-the-art candidates "SN" (segment neighborhood), "BS" (binary segmentation), "WBS" (wild binary segmentation) and "PELT" (pruned exact linear time) algorithms.
dist_min: an integer indicating the minimum distance between two successive candidate change-points, with a default value $floor(log(n))$.
ncps_max: an integer indicating the maximum number of change-points searched for, with a default value $ceiling(n^0.4)$.
pelt_pen_val: a numeric vector specifying the collection of candidate values of the penalty if the "PELT" algorithm is used.
pelt_K: a numeric value to adjust the pruning tactic, usually is taken to be 0 if negative log-likelihood is used as a cost; more details can be found in Killick et al. (2012).
wbs_nintervals: an integer indicating the number of random intervals drawn in the "WBS" algorithm and a default value 500 is used.
criterion: a character string indicating which model selection criterion, "cross- validation" ("CV") or "multiple-splitting" ("MS"), is used.
times: an integer indicating how many times of sample-splitting should be performed; if "CV" criterion is used, it should be set as 2.
Sigma: if a numeric matrix (or constant) is supplied, it would be taken as the value of known overall covariance (or variance). By default it is set as NULL, and the common covariance of the data is estimated based on the difference method, i.e., $$\widehat{\Sigma} = \frac{1}{2(n-1)}\sum_{i=1}^{n-1} (Y_i-Y_{i+1})(Y_i-Y_{i+1})';$$

References

Killick, R., Fearnhead, P., and Eckley, I. A. (2012). Optimal Detection of Changepoints With a Linear Computational Cost. Journal of the American Statistical Association, 107(500):1590–1598.

Examples

Run this code

library("cpss")
set.seed(666)
n <- 2048
tau <- c(205, 267, 308, 472, 512, 820, 902, 1332, 1557, 1598, 1659)
seg_len <- diff(c(0, tau, n))
mu <- rep(c(0, 14.64, -3.66, 7.32, -7.32, 10.98, -4.39, 3.29, 19.03, 7.68, 15.37, 0), seg_len)
ep <- 7 * rnorm(n)
y <- mu + ep
# \donttest{
res <- cpss.mean(y, algorithm = "SN", dist_min = 10, ncps_max = 20)
summary(res)
# 205  267  307  471  512  820  897  1332  1557  1601  1659
plot(res, type = "scatter")
plot(res, type = "path")
out <- update(res, dim_update = 12)
out$cps_update
# 205  267  307  471  512  820  897 1332 1557 1601 1659 1769
out$params_update
# }