stepFit: Piecewise constant multiscale inference

Description

Computes the multiscale regression estimator, see (3.1) in the vignette, and allows for confidence statements, see section 3 in the vignette. It implements the estimators SMUCE and HSMUCE as well as their confidence intervals and bands. If q == NULL a Monte-Carlo simulation is required for computing critical values. Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, this package saves them by default in the workspace and on the file system such that a second call requiring the same Monte-Carlo simulation will be much faster. For more details, in particular to which arguments the Monte-Carlo simulations are specific, see Section Storing of Monte-Carlo simulations below. Progress of a Monte-Carlo simulation can be reported by the argument messages and the saving can be controlled by the argument option, both can be specified in … and are explained in monteCarloSimulation and critVal, respectively.

Usage

stepFit(y, q = NULL, alpha = NULL, x = 1:length(y), x0 = 2 * x[1] - x[2],
        family = NULL, intervalSystem = NULL, lengths = NULL, confband = FALSE,
        jumpint = confband, ...)

Arguments

a numeric vector containing the observations

either NULL, then the vector of critical values at level alpha will be computed from a Monte-Carlo simulation, or a numeric giving the global quantile or a numeric vector giving the vector of critical values. Either q or alpha must be given. Otherwise, alpha == 0.5 is chosen with a warning. This argument will be passed to critVal to obtain the needed critical values. Additional parameters for the computation of q can be specified in …, for more details see the documentation of critVal. Please note that by default the Monte-Carlo simulation will be saved in the workspace and on the file system, for more details see Section Storing of Monte-Carlo simulations below

alpha

a probability, i.e. a single numeric between 0 and 1, giving the significance level. Its choice is a trade-off between data fit and parsimony of the estimator. In other words, this argument balances the risks of missing change-points and detecting additional artefacts. For more details on this choice see (Frick et al., 2014, section 4) and (Pein et al., 2017, section 3.4). Either q or alpha must be given. Otherwise, alpha == 0.5 is chosen with a warning

a numeric vector of the same length as y containing the corresponding sample points

a single numeric giving the last unobserved sample point directly before sampling started

family

a string specifying the assumed parametric family, for more details see parametricFamily, currently "gauss", "hsmuce" and "mDependentPS" are supported. By default (NULL) "gauss" is assumed

intervalSystem

a string giving the used interval system, either "all" for all intervals, "dyaLen" for all intervals of dyadic length or "dyaPar" for the dyadic partition, for more details see intervalSystem. By default (NULL) the default interval system of the specified parametric family will be used, which one this will be is described in parametricFamily

lengths

an integer vector giving the set of lengths, i.e. only intervals of these lengths will be considered. Note that not all lengths are possible for all interval systems and for all parametric families, see intervalSystem and parametricFamily, respectively, to see which ones are allowed. By default (NULL) all lengths that are possible for the specified intervalSystem and for the specified parametric family will be used

confband

single logical, indicates if a confidence band for the piecewise-continuous function should be computed

jumpint

single logical, indicates if confidence sets for change-points should be computed

...

there are two groups of further arguments:

further parameters of the parametric family. Depending on argument family some might be required, but others might be optional, please see parametricFamily for more details,
further parameters that will be passed to critVal. critVal will be called automatically with the number of observations n = length(y), the arguments family, intervalSystem, lengths, q and output set. For these arguments no user interaction is required and possible, all other arguments of critVal can be passed additionally

Value

An object of class stepfit that contains the fit. If jumpint == TRUE function jumpint allows to extract the 1 - alpha confidence interval for the jumps. If confband == TRUE function confband allows to extract the 1 - alpha confidence band.

Storing of Monte-Carlo simulations

If q == NULL a Monte-Carlo simulation is required for computing critical values. Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, this package offers multiple possibilities for saving and loading the simulations. Progress of a simulation can be reported by the argument messages which can be specified in … and is explained in the documentation of monteCarloSimulation. Each Monte-Carlo simulation is specific to the number of observations, the parametric family (including certain parameters, see parametricFamily) and the interval system, and for simulations of class "MCSimulationMaximum", additionally, to the set of lengths and the used penalty. Monte-Carlo simulations can also be performed for a (slightly) larger number of observations \(n_q\) given in the argument nq in … and explained in the documentation of critVal, which avoids extensive resimulations for only a little bit varying number of observations. Simulations can either be saved in the workspace in the variable critValStepRTab or persistently on the file system for which the package R.cache is used. Moreover, storing in and loading from variables and RDS files is supported. Finally, a pre-simulated collection of simulations can be accessed by installing the package stepRdata available from http://www.stochastik.math.uni-goettingen.de/stepRdata_1.0-0.tar.gz. The simulation, saving and loading can be controlled by the argument option which can be specified in … and is explained in the documentation of critVal. By default simulations will be saved in the workspace and on the file system. For more details and for how simulation can be removed see Section Simulating, saving and loading of Monte-Carlo simulations in critVal.

References

Frick, K., Munk, A., Sieling, H. (2014) Multiscale change-point inference. With discussion and rejoinder by the authors. Journal of the Royal Statistical Society, Series B 76(3), 495--580.

Pein, F., Sieling, H., Munk, A. (2017) Heterogeneous change point inference. Journal of the Royal Statistical Society, Series B, 79(4), 1207--1227.

Examples

Run this code

# NOT RUN {
# generate random observations
y <- c(rnorm(50), rnorm(50, 1))
x <- seq(0.01, 1, 0.01)
plot(x, y, pch = 16, col = "grey30", ylim = c(-3, 4))

# computation of SMUCE and its confidence statements
fit <- stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE)
lines(fit, lwd = 3, col = "red", lty = "22")

# confidence intervals for the change-point locations
points(jumpint(fit), col = "red")
# confidence band
lines(confband(fit), lty = "22", col = "darkred", lwd = 2)

# higher significance level for larger detection power, but less confidence
stepFit(y, x = x, alpha = 0.99, jumpint = TRUE, confband = TRUE)

# smaller significance level for the small risk that the number of
# change-points is overestimated with probability not more than 5%,
# but smaller detection power
stepFit(y, x = x, alpha = 0.05, jumpint = TRUE, confband = TRUE)
# }
# NOT RUN {
# different interval system, lengths, penalty and given parameter sd
stepFit(y, x = x, alpha = 0.5, intervalSystem = "dyaLen",
        lengths = c(1L, 2L, 4L, 8L), penalty = "weights",
        weights = c(0.4, 0.3, 0.2, 0.1), sd = 0.5,
        jumpint = TRUE, confband = TRUE)
# }
# NOT RUN {
# with given q
identical(stepFit(y, x = x, q = critVal(100L, alpha = 0.5),
                  jumpint = TRUE, confband = TRUE), fit)
identical(stepFit(y, x = x, q = critVal(100L, alpha = 0.5, output = "value"),
                  jumpint = TRUE, confband = TRUE), fit)
# }
# NOT RUN {
# the above calls saved and (attempted to) load Monte-Carlo simulations and
# simulated them for nq = 128 observations
# in the following call no saving, no loading and simulation for n = 100
# observations is required, progress of the simulation will be reported
stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE,
        messages = 1000L, options = list(simulation = "vector",
                                         load = list(), save = list()))

# with given stat to compute q
stat <- monteCarloSimulation(n = 128L)
identical(stepFit(y, x = x, alpha = 0.5, stat = stat,
                  jumpint = TRUE, confband = TRUE),
          stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE,
                  options = list(load = list())))
# }

Run the code above in your browser using DataLab