cpss.custom: Detecting changes in uers-customized models

Description

Detecting changes in uers-customized models

Usage

cpss.custom(
  dataset,
  n,
  g_subdat,
  g_param,
  g_cost,
  algorithm = "BS",
  dist_min = floor(log(n)),
  ncps_max = ceiling(n^0.4),
  pelt_pen_val = NULL,
  pelt_K = 0,
  wbs_nintervals = 500,
  criterion = "CV",
  times = 2,
  model = NULL,
  g_smry = NULL,
  easy_cost = NULL,
  param.opt = NULL
)

Value

cpss.custom returns an object of an S4 class, called "cpss", which collects data and information required for further change-point analyses and summaries.

dat: an ANY object inheriting form the type of user-input data
mdl: a character string describing considered change-point model
algo: a character string indicating user-specified change-point searching algorithm
algo_param_dim: an integer indicating user-specified maximum number of change-points searched for if the algorithm is chosen among "SN", "BS" and "WBS", or a numeric vector collecting user-specified values for the penalty if the algorithm is "PELT"
SC: a character string indicating model selection criterion
ncps: an integer giving estimated number of change-points based on the entire data
pelt_pen: a numeric value indicating selected penalty value if the "PELT" algorithm is performed based on the entire data
cps: a numeric vector of detected change-points based on the entire data
params: a list object, each of whose members is a list containing estimated parameters in the corresponding segment
S_vals: a numeric vector of candidate model dimensions in terms of a sequence of numbers of change-points or values of penalty
SC_vals: a numeric matrix, each column of which records the values of criterion based on the validation data under the corresponding model dimension (S_vals), and each row of which represents a splitting at each time

Arguments

dataset: an ANY object that could be of any form such as a vector, matrix, tensor, list, etc.
n: an integer indicating the sample size of the dataset.
g_subdat: a customized R function of two arguments dat and indices, that returns a subset of the dat (inheriting the class from that of dataset) according to given indices along the observed time orders. The argument indices is a logical vector with TRUE indicating selected indices.
g_param: a customized R function of two arguments, dat and param.opt, that returns estimates of interested parameters that minimizes users-specified cost for a data set dat. The returned object could be of any class such as a numeric value, vector, matrix, list, etc. The argument param.opt might be used in the estimation procedures.
g_cost: a customized R function of two arguments, dat and param, that returns a numeric value of associated cost for a data set dat, under the knowledge of the interested parameters being param. The argument param inherits from the class of the returned object of the function g_param. If param.opt is needed to evaluate the cost, they should be packed into param when defining the function g_param.
algorithm: a character string specifying the change-point searching algorithm, one of four state-of-the-art candidates "SN" (segment neighborhood), "BS" (binary segmentation), "WBS" (wild binary segmentation) and "PELT" (pruned exact linear time) algorithms.
dist_min: an integer indicating the minimum distance between two successive candidate change-points, with a default value $floor(log(n))$.
ncps_max: an integer indicating the maximum number of change-points searched for, with a default value $ceiling(n^0.4)$.
pelt_pen_val: a numeric vector specifying the collection of candidate values of the penalty if the "PELT" algorithm is used.
pelt_K: a numeric value to adjust the pruning tactic, usually is taken to be 0 if negative log-likelihood is used as a cost; more details can be found in Killick et al. (2012).
wbs_nintervals: an integer indicating the number of random intervals drawn in the "WBS" algorithm and a default value 500 is used.
criterion: a character string indicating which model selection criterion, "cross- validation" ("CV") or "multiple-splitting" ("MS"), is used.
times: an integer indicating how many times of sample-splitting should be performed; if "CV" criterion is used, it should be set as 2.
model: a character string indicating the considered change model, and will be set as "custom" if not provided.
g_smry: a customized R function of two arguments dataset and param.opt, which calculates the summary statistics that will be needed in evaluations of the cost. The returned object is a list for convenience.
easy_cost: a customized R function of three arguments data_smry, s and e, that evaluates the cost for a date segment form observed time point $s$ to $e$. The argument data_smry inherits from the returned list of the function g_smry.
param.opt: an ANY object that could be of any form, specifying additional global constant parameters beyond the interested parameters.

References

Killick, R., Fearnhead, P., and Eckley, I. A. (2012). Optimal Detection of Changepoints With a Linear Computational Cost. Journal of the American Statistical Association, 107(500):1590–1598.

Examples

Run this code

library("cpss")
if (!requireNamespace("L1pack", quietly = TRUE)) {
  stop("Please install the package \"L1pack\".")
}
set.seed(666)
n <- 1000
tau <- c(250, 500, 750)
tau_ext <- c(0, tau, n)
be0 <- c(1, 1, 0, -1)
be <- c(1, -1, -1, 1)
seg_len <- diff(c(0, tau, n))
x <- rnorm(n)
eta <- unlist(lapply(seq(1, length(tau) + 1), function(k) {
  be0[k] + be[k] * x[(tau_ext[k] + 1):tau_ext[k + 1]]
}))
ep <- L1pack::rlaplace(n)
y <- eta + ep
g_subdat_l1 <- function(dat, indices) {
  matrix(dat[indices, ], sum(indices), ncol(dat))
}
g_param_l1 <- function(dat, param.opt = NULL) {
  y <- dat[, 1]
  x <- dat[, -1]
  return(L1pack::l1fit(x, y)$coefficients)
}
g_cost_l1 <- function(dat, param) {
  y <- dat[, 1]
  x <- dat[, -1]
  return(sum(abs(y - cbind(1, x) %*% as.matrix(param))))
}
res <- cpss.custom(
  dataset = cbind(y, x), n = n,
  g_subdat = g_subdat_l1, g_param = g_param_l1, g_cost = g_cost_l1,
  algorithm = "BS", dist_min = 10, ncps_max = 10,
  g_smry = NULL, easy_cost = NULL
)
summary(res)
# 250  500  744
do.call(rbind,res@params)
# Intercept          X
# [1,]  0.9327557  0.9558247
# [2,]  0.9868086 -1.0254999
# [3,] -0.0464067 -0.9076744
# [4,] -0.9746133  0.9671701

Run the code above in your browser using DataLab