fastcpd: Fast Change Point Detection in R
Overview
The fastcpd (fast change point detection) is a fast implmentation of change point detection methods in R. The fastcpd package is designed to find change points in a fast manner. It is easy to install and extensible to all kinds of change point problems with a user specified cost function apart from the built-in cost functions.
To learn more behind the algorithms:
- fastcpd: Fast Change Point Detection in R
- Sequential Gradient Descent and Quasi-Newton’s Method for Change-Point Analysis
Installation
# Install from r-universe with CRAN version as a fallback
install.packages(
"fastcpd",
repos = c("https://doccstat.r-universe.dev", "https://cloud.r-project.org")
)
## install.packages("pak")
pak::pak("doccstat/fastcpd")
## install.packages("devtools")
devtools::install_github("doccstat/fastcpd")
# conda-forge is a fork from CRAN and may not be up-to-date
# Use mamba
mamba install r-fastcpd
# Use conda
conda install -c conda-forge r-fastcpd
FAQ
The suggested packages are not required for the main functionality of the package. They are only required for the vignettes. If you want to learn more about the package comparison and other vignettes, you could either check out vignettes on CRAN or pkgdown generated documentation.
The package should be able to install on Mac and any Linux distribution
without any problems if all the dependencies are installed. However, if
you encountered problems related to gfortran, it might be because
RcppArmadillo
is not installed previously. Try Mac OSX stackoverflow
solution or Linux stackover
solution if you have trouble
installing RcppArmadillo
.
Cheatsheet
R Shiny App
Available soon: rshiny.fastcpd.xingchi.li
Usage
set.seed(1)
n <- 1000
x <- rep(0, n + 3)
for (i in 1:600) {
x[i + 3] <- 0.6 * x[i + 2] - 0.2 * x[i + 1] + 0.1 * x[i] + rnorm(1, 0, 3)
}
for (i in 601:1000) {
x[i + 3] <- 0.3 * x[i + 2] + 0.4 * x[i + 1] + 0.2 * x[i] + rnorm(1, 0, 3)
}
result <- fastcpd::fastcpd.ar(x[3 + seq_len(n)], 3, r.progress = FALSE)
summary(result)
#>
#> Call:
#> fastcpd::fastcpd.ar(data = x[3 + seq_len(n)], order = 3, r.progress = FALSE)
#>
#> Change points:
#> 614
#>
#> Cost values:
#> 2754.116 2038.945
#>
#> Parameters:
#> segment 1 segment 2
#> 1 0.57120256 0.2371809
#> 2 -0.20985108 0.4031244
#> 3 0.08221978 0.2290323
plot(result)
It is hard to demonstrate all the features of fastcpd
in a single
example due to the flexibility of the package. For more examples, please
refer to the function
reference.
r.progress = FALSE
is used to suppress the progress bar. Users are
expected to see the progress bar when running the code by default.
library(microbenchmark)
set.seed(1)
n <- 5 * 10^6
mean_data <- c(rnorm(n / 2, 0, 1), rnorm(n / 2, 50, 1))
ggplot2::autoplot(microbenchmark(
fastcpd = fastcpd::fastcpd.mean(mean_data, r.progress = FALSE, cp_only = TRUE, variance_estimation = 1),
changepoint = changepoint::cpt.mean(mean_data, method = "PELT"),
fpop = fpop::Fpop(mean_data, 2 * log(n)),
gfpop = gfpop::gfpop(
data = mean_data,
mygraph = gfpop::graph(
penalty = 2 * log(length(mean_data)) * gfpop::sdDiff(mean_data) ^ 2,
type = "updown"
),
type = "mean"
),
jointseg = jointseg::jointSeg(mean_data, K = 12),
mosum = mosum::mosum(c(mean_data), G = 40),
not = not::not(mean_data, contrast = "pcwsConstMean"),
wbs = wbs::wbs(mean_data)
))
#> Warning in microbenchmark(fastcpd = fastcpd::fastcpd.mean(mean_data, r.progress
#> = FALSE, : less accurate nanosecond times to avoid potential integer overflows
Examples
Main function
Wrapper functions
Time series
- AR(p):
fastcpd_ar
- ARIMA(p, d, q):
fastcpd_arima
- ARMA(p, q):
fastcpd_arma
- GARCH(p, q):
fastcpd_garch
- VAR(p):
fastcpd_var
- General time series:
fastcpd_ts
Unlabeled data
- Mean change:
fastcpd_mean
- Variance change:
fastcpd_variance
- Mean and/or variance change:
fastcpd_meanvariance
Regression data
- Logistic regression:
fastcpd_binomial
- Penalized linear regression:
fastcpd_lasso
- Linear regression:
fastcpd_lm
- Poisson regression:
fastcpd_poisson
Utility functions
Variance estimation
- Variance estimation in ARMA models:
variance_arma
- Variance estimation in linear models:
variance_lm
- Variance estimation in mean change models:
variance_mean
- Variance estimation in median change models:
variance_median
Class methods
Data
- Bitcoin Market Price (USD):
bitcoin
- Occupancy Detection Data Set:
occupancy
- Transcription Profiling of 57 Human Bladder Carcinoma Samples:
transcriptome
- UK Seatbelts Data:
uk_seatbelts
- Well-log Dataset from Numerical Bayesian Methods Applied to Signal
Processing:
well_log
Main class
Make contributions
Fork the repo.
Create a new branch from
main
branch.Make changes and commit them.
- Please follow the Google’s R style guide for naming variables and functions.
- If you are adding a new family of models with new cost functions
with corresponding gradient and Hessian, please add them to
src/fastcpd_class_cost.cc
with proper example and tests invignettes/gallery.Rmd
andtests/testthat/test-gallery.R
. - Add the family name to
src/fastcpd_constants.h
. - [Recommended] Add a new wrapper function in
R/fastcpd_wrappers.R
for the new family of models and move the examples to the new wrapper function as roxygen examples. - Add the new wrapper function to the corresponding section in
_pkgdown.yml
.
Push the changes to your fork.
Create a pull request.
Make sure the pull request does not create new warnings or errors in
devtools::check()
.
Contact us
- File a ticket at GitHub Issues.
- Contact the authors specified in DESCRIPTION.