For a given lagged value of the time series, performs nonparametric change point detection of a possibly multivariate
time series. If lag \(\ell = 0\), then only marginal changes are detected.
If lag \(\ell \neq 0\), then changes in the pairwise distribution of \((X_t , X_{t+\ell})\) are detected.
np.mojo(
x,
G,
lag = 0,
kernel.f = c("quad.exp", "gauss", "euclidean", "laplace", "sine")[1],
kern.par = 1,
data.driven.kern.par = TRUE,
alpha = 0.1,
threshold = c("bootstrap", "manual")[1],
threshold.val = NULL,
reps = 200,
boot.dep = 1.5 * (nrow(as.matrix(x))^(1/3)),
parallel = FALSE,
boot.method = c("mean.subtract", "no.mean.subtract")[1],
criterion = c("eta", "epsilon", "eta.and.epsilon")[3],
eta = 0.4,
epsilon = 0.02,
use.mean = FALSE,
scale.data = TRUE
)A list object that contains the following fields:
Input data
Moving window bandwidth
Lag used to detect changes
Input parameters
The value of the kernel tuning parameter
Input parameters
Threshold value for declaring change points
Input parameters
A vector containing the NP-MOJO detector statistics computed from the input data
A vector containing the estimated change point locations
The corresponding importance scores of the estimated change points. The larger the score is, the more likely that there exists a change point close to the estimated location. If the bootstrap method is used, this a value between 0 and 1 corresponding to the proportion of times the observed detector statistic was larger than the bootstrapped detector statistics. Otherwise, the importance score is simply the value of the detector statistic at the estimated change point location (which is not necessarily less than 1).
Input data (a numeric vector or an object of classes ts and timeSeries,
or a numeric matrix with rows representing observations and columns representing variables).
An integer value for the moving sum bandwidth;
G should be less than half the length of the time series.
The lagged values of the time series used to detect changes. If lag \(\ell = 0\), then only marginal changes are detected.
If lag \(\ell \neq 0\), then changes in the pairwise distribution of \((X_t , X_{t+\ell})\) are detected.
String indicating which kernel function to use when calculating the NP-MOJO detectors statistics; with kern.par \(= a\), possible values are
"quad.exp": kernel \(h_2\) in McGonigle and Cho (2025), kernel 5 in Fan et al. (2017):
$$h (x,y) = \prod_{i=1}^{2p} \frac{ (2a - (x_i - y_i)^2) \exp (-\frac{1}{4a} (x_i - y_i)^2 )}{2a} .$$
"gauss": kernel \(h_1\) in McGonigle and Cho (2025), the standard Gaussian kernel:
$$h (x,y) = \exp ( - \frac{a^2}{2} \Vert x - y \Vert^2) .$$
"euclidean": kernel \(h_3\) in McGonigle and Cho (2025), the Euclidean distance-based kernel:
$$h (x, y ) = \Vert x - y \Vert^a .$$
"laplace": kernel 2 in Fan et al. (2017), based on a Laplace weight function:
$$h (x, y ) = \prod_{i=1}^{2p} \left( 1+ a^2 (x_i - y_i)^2 \right)^{-1}. $$
"sine": kernel 4 in Fan et al. (2017), based on a sinusoidal weight function:
$$h (x, y ) = \prod_{i=1}^{2p} \frac{-2 | x_i - y_i | + | x_i - y_i - 2a| + | x_i - y_i +2a| }{4a} .$$
The tuning parameter that appears in the expression for the kernel function, which acts as a scaling parameter,
only to be used if data.driven.kern.par = FALSE. If kernel.f = "euclidean", then kern.par \(\in (0,2)\),
otherwise kern.par \(> 0\).
A logical variable, if set to TRUE, then the kernel tuning parameter is calculated
using the median heuristic, if FALSE it is given by kern.par.
A numeric value for the significance level with
0 <= alpha <= 1; use iff threshold = "bootstrap".
String indicating how the threshold is computed. Possible values are
"bootstrap": the threshold is calculated using the bootstrap method
with significance level alpha.
"manual": the threshold is set by the user and must be
specified using the threshold.val parameter.
The value of the threshold used to declare change points, only to be used if threshold = "manual".
An integer value for the number of bootstrap replications performed, if threshold = "bootstrap".
A positive value for the strength of dependence in the multiplier bootstrap sequence, if threshold = "bootstrap".
A logical variable, if set to TRUE, then parallel computing is used in the bootstrapping procedure
if bootstrapping is performed.
A string indicating the method for creating bootstrap replications. It is not recommended to change this. Possible choices are
"mean.subtract": the default choice, as described in McGonigle and Cho (2025).
Empirical mean subtraction is performed to the bootstrapped replicates, improving power.
"no.mean.subtract": empirical mean subtraction is not performed, improving size control.
String indicating how to determine whether each point k at which NP-MOJO statistic
exceeds the threshold is a change point; possible values are
"epsilon": k is the maximum of its local exceeding environment,
which has at least size epsilon*G.
"eta": there is no larger exceeding in an eta*G environment of k.
"eta.and.epsilon": the recommended default option; k satisfies both
the eta and epsilon criterion.
Recommended to use with the standard value of eta that would be used if criterion = "eta" (e.g. 0.4),
but much smaller value of epsilon than would be used if criterion = "epsilon", e.g. 0.02.
A positive numeric value for the minimal mutual distance of
changes, relative to bandwidth (if criterion = "eta" or criterion = "eta.and.epsilon").
a numeric value in (0,1] for the minimal size of exceeding
environments, relative to moving sum bandwidth (if criterion = "epsilon" or criterion = "eta.and.epsilon").
Logical variable, only to be used if data.drive.kern.par=TRUE. If set to TRUE, the mean
of pairwise distances is used to set the kernel function tuning parameter, instead of the median. May be useful for binary data,
not recommended to be used otherwise.
Logical variable, whether to scale the data in each dimension before performing change point detection.
Performance is generally improved by scaling the data.
The single-lag NP-MOJO algorithm for nonparametric change point detection is described in McGonigle, E. T. and Cho, H. (2025) Nonparametric data segmentation in multivariate time series via joint characteristic functions. Biometrika (to appear).
McGonigle, E.T., Cho, H. (2025). Nonparametric data segmentation in multivariate time series via joint characteristic functions. Biometrika (to appear).
Fan, Y., de Micheaux, P.L., Penev, S. and Salopek, D. (2017). Multivariate nonparametric test of independence. Journal of Multivariate Analysis, 153, pp.189-210.
np.mojo.multilag
set.seed(1)
n <- 500
noise <- c(rep(1, 300), rep(0.4, 200)) * stats::arima.sim(model = list(ar = 0.3), n = n)
signal <- c(rep(0, 100), rep(2, 400))
x <- signal + noise
x.c <- np.mojo(x, G = 83, lag = 0)
x.c$cpts
x.c$scores
Run the code above in your browser using DataLab