fit_blockcpd: Fits a blockcpd model

Description

Fits a blockcpd model to find the best segmentation of the data into blocks. Variables in each block have the same distribution and parameter, and consecutive blocks have different parameters.

Usage

fit_blockcpd(
  data_matrix,
  method = "hierseg",
  family = "bernoulli",
  lambda = 1,
  pen_func = bic_loss,
  min_block_size = 1L,
  max_blocks = NULL,
  bootstrap = FALSE,
  bootstrap_samples = 100L,
  bootstrap_progress = FALSE,
  skip_input_check = FALSE
)

Value

The function returns a S3 object of the type blockcpd.

"changepoints" a list containing the set of estimated change points;
"parameters" a list containing the estimated parameters for each block. In the case of multiple parameters, it provides a list of lists, where each sub list refers to the parameter that names the list;
"loss" the final loss evaluated on the entire data set for the returned model;
"neg_loglike" The negative log likelihood of the model;
"ncp" number of change points estimated;
"metadata" Arguments passed to fit the model;
"bootstrap_info" if bootstrap argument is true, this contains a list of the metrics for each bootstrap sample, and contains the estimated probability of each index being detected as a change point;

Arguments

data_matrix

Data frame or matrix containing the data set to be segmented. There is no verification if the entries correspond to the model specified by the "family" argument, such as entries different than 0, 1 or NA for the bernoulli family.

method

The method that will be used to fit the model. The current implemented models are:

[hierseg] Hierarchical segmentation, also known as binary segmentation;
[dynseg] Dynamical programming segmentation.

family

The name of the family to detect changes in parameters. Should be passed as a string. The families currently implemented are:

"bernoulli": The model assumes that data comes from a Bernoulli distribution. For each block, the algorithm estimates the probability paramater. Each entry should be binary.
"normal": The model assumes data comes fro ma Normal distribution with unknown mean and variance. For each block, the algorithms estimates the mean and variance parameter. Each entry should be numeric.
"binaryMarkov": The model assumes that data comes from two states (0, 1) Markov Chain. For each block, the algorithm estimates the 2x2 transition matrix. Each entry should be binary. At the boundary of the blocks, the transition is defined using the parameters of the next (new) block. For instance, consider a block defined from a to c, followed a block from c + 1 to b (including the extremes). By definition, c is a change point, and the transition from X_c to X_c + 1 is defined by the parameters on c + 1 to b.
"exponential": The model assumes that data comes from an Exponential distribution. For each block, the algorithm estimates the scale parameter, that is, the inverse of the rate. Each entry should be numeric and positive.
"poisson": The model assumes that data comes from a Poisson distribution For each block, the algorithm estimates the rate paramater. Each entry should an positive integer.

lambda

The penalization constant. Must be a unique non-negative numeric value.

pen_func

Regularization function used for fitting, with default as the BIC. For user specified functions, check the template in the regularization regularization.rd file.

min_block_size

Minimum block size allowed. Default is 1, and the value must be smaller or equal to ncol.

max_blocks

An integer greater than 0 that specify the maximum number of blocks fitted by the algorithm. It is only used if dynseg is specified in the "method" argument.

bootstrap

A flag to decide if bootstrap computations for the estimation of the probability of each index being detected as a change point. It also provides a sample of all the metrics implemented computed with respect to the final change point set estimated.

bootstrap_samples

Number of bootstrap samples.

bootstrap_progress

Flag for bootstrap progress printing.

skip_input_check

Flag indicating if input checking should be skipped.

Examples

Run this code

fit_blockcpd(c(0, 1, 2, 10, 11), family = "normal", lambda = 1) # single series
fit_blockcpd(matrix(c(0, 1, 0, 0, 0, 0, 1, 1), nrow = 2)) # 2 binary series

Run the code above in your browser using DataLab