fiaparch: FIAPARCH Model Fitting

Description

Fit a fractionally integrated APARCH model under the six most common and further conditional distributions to observed data using quasi maximum-likelihood estimation.

Usage

fiaparch(
  rt,
  orders = c(1, 1),
  cond_dist = c("norm", "std", "ged", "ald", "snorm", "sstd", "sged", "sald"),
  drange = c(0, 1),
  meanspec = mean_spec(),
  Drange = c(0, 1),
  nonparspec = locpol_spec(),
  use_nonpar = FALSE,
  n_test = 0,
  start_pars = NULL,
  LB = NULL,
  UB = NULL,
  control = list(),
  control_nonpar = list(),
  mean_after_nonpar = FALSE,
  parallel = TRUE,
  ncores = max(1, future::availableCores() - 1),
  trunc = "none",
  presample = 50,
  Prange = c(1, 5),
  fix_delta = c(NA, 1, 2)
)

Value

An object of S4-class "fEGarch_fit_fiaparch"

is returned. It contains the following elements.

pars:: a named numeric vector with the parameter estimates.
se:: a named numeric vector with the obtained standard errors in accordance with the parameter estimates.
vcov_mat:: the variance-covariance matrix of the parameter estimates with named columns and rows.
rt:: the input object rt (or at least the training data, if n_test is greater than zero); if rt was a "zoo" or "ts" object, the formatting is kept.
cmeans:: the estimated conditional means; if rt was a "zoo" or "ts" object, the formatting is also applied to cmeans.
sigt:: the estimated conditional standard deviations (or for use_nonpar = TRUE the estimated total volatilities, i.e. scale function value times conditional standard deviation); if rt was a "zoo" or "ts" object, the formatting is also applied to sigt.
etat:: the obtained residuals; if rt was a "zoo" or "ts" object, the formatting is also applied to etat.
orders:: a two-element numeric vector stating the considered model orders.
cond_dist:: a character value stating the conditional distribution considered in the model fitting.
long_memo:: a logical value stating whether or not long memory was considered in the model fitting.
llhood:: the log-likelihood value obtained at the optimal parameter combination.
inf_criteria:: a named two-element numeric vector with the corresponding AIC (first element) and BIC (second element) of the fitted parametric model part; for purely parametric models, these criteria are valid for the entire model; for semiparametric models, they are only valid for the parametric step and are not valid for the entire model.
meanspec:: the settings for the model in the conditional mean; is an object of class "mean_spec" that is identical to the object passed to the input argument meanspec.
test_obs:: the observations at the end up the input rt reserved for testing following n_test.
scale_fun:: the estimated scale function values, if use_nonpar = TRUE, otherwise NULL; formatting of rt is reused.
nonpar_model:: the estimation object returned by tsmoothlm for use_nonpar = TRUE.
trunc:: the input argument trunc.

Arguments

rt: the observed series ordered from past to present; can be a numeric vector, a "zoo" class time series object, or a "ts" class time series object.
orders: a two-element numeric vector containing the two model orders $p$ and $q$ (see Details for more information); currently, only the default orders = c(1, 1) is supported; other specifications of a two-element numeric vector will lead to orders = c(1, 1) being run and a warning message being returned.
cond_dist: the conditional distribution to consider as a character object; the default is a conditional normal distribution "norm"; available are also, however, a $t$-distribution ("std"), a generalized error distribution ("ged"), an average Laplace distribution ("ald"), and their four skewed variants ("snorm", "sstd", "sged", "sald").
drange: a two-element numeric vector that gives the boundaries of the search interval for the fractional differencing parameter $d$ in the conditional volatility model part; is overwritten by the settings of the arguments LB and UB.
meanspec: an object of class "mean_spec"; indicates the specifications for the model in the conditional mean.
Drange: a two-element numeric vector that indicates the boundaries of the interval over which to search for the fractional differencing parameter $D$ in a long-memory ARMA-type model in the conditional mean model part; by default, $D$ being searched for on the interval from 0 to $0.5 - 1\times 10^{-6}$; note that specific settings in the arguments LB and UB overwrite this argument.
nonparspec: an object of class "locpol_spec" returned by locpol_spec; defines the settings of the nonparametric smoothing technique for use_nonpar = TRUE.
use_nonpar: a logical indicating whether or not to implement a semiparametric extension of the volatility model defined through spec; see "Details" for more information.
n_test: a single numerical value indicating, how many observations at the end of rt not to include in the fitting process and to reserve for backtesting.
start_pars: the starting parameters for the numerical optimization routine; should be of the same length as the parameter output vector within the output object (also keeping the same order); for NULL, an internally saved default set of values is used; see "Details" for the order of elements; elements should be set with respect to a series rescaled to have sample variance one.
LB: the lower boundaries of the parameters in the numerical optimization routine; should be of the same length as the parameter output vector within the output object (also keeping the same order); for NULL, an internally saved default set of values is used; see "Details" for the order of elements; elements should be set with respect to a series rescaled to have sample variance one.
UB: the upper boundaries of the parameters in the numerical optimization routine; should be of the same length as the parameter output vector within the output object (also keeping the same order); for NULL, an internally saved default set of values is used; see "Details" for the order of elements; elements should be set with respect to a series rescaled to have sample variance one.
control: a list that is passed to control of the function solnp of the package Rsolnp.
control_nonpar: a list containing changes to the arguments for the hyperparameter estimation algorithm in the nonparametric scale function estimation for use_nonpar = TRUE; see "Details" for more information.
mean_after_nonpar: only for use_nonpar = TRUE; considers the unconditional mean of the parametric model part in the QMLE step in a semiparametric model; by default, a zero-mean model is considered for the parametric part in a semiparametric model.
parallel: only relevant for a (skewed) average Laplace (AL) distribution, i.e. if cond_dist in spec is set to cond_dist = "ald" or cond_dist = "sald"; parallel is a logical value indicating whether or not the slices for the positive integer-valued parameter of the SM distribution should be fitted in parallel for a speed boost.
ncores: only relevant for a (skewed) average Laplace (AL) distribution, i.e. if cond_dist in spec is set to cond_dist = "ald" or cond_dist = "sald", and if simultaneously parallel = TRUE; ncores is a single numeric value indicating the number of cores to use for parallel computations.
trunc: a positive integer indicating the finite truncation length of the infinite-order polynomials of the infinite-order representations of the long-memory model parts; the character "none" is an optional input that specifies that truncation should always be applied back to the first (presample) observation time point, i.e. that maximum length filters should be applied at all times.
presample: the presample length for initialization (for extended EGARCH- / Log-GARCH-type models only relevant for the FARIMA-part, as series in log-transformed conditional variance are initialized by zero).
Prange: a two-element vector that indicates the search boundaries for the parameter $P$ in a (skewed) average Laplace distribution.
fix_delta: let the parameter $\delta$ be either free (fix_delta = NA), fix it to 1 (fix_delta = 1) or to 2 (fix_delta = 2); the latter two are specific submodels of a FIAPARCH.

Details

Consider a FIAPARCH($p, d, q$) with constant asymmetry term $\gamma$ regardless of the lag. Let $\left\{r_t\right\}$, with $t \in \mathbb{Z}$ as the time index, be a theoretical time series that follows $$r_t=\mu+\varepsilon_t \text{ with } \varepsilon_t=\sigma_t \eta_t \text{ and } \eta_t \sim \text{IID}(0,1), \text{ where}$$ $$\sigma_t^{\delta}=\omega+\left[1-\beta^{-1}(B)\phi(B)(1-B)^{d}\right]\left(\left|\varepsilon_t\right|-\gamma\varepsilon_t\right)^{\delta}.$$ Here, $\eta_t\sim\text{IID}(0,1)$ means that the innovations $\eta_t$ are independent and identically distributed (iid) with mean zero and variance one, whereas $\sigma_t > 0$ are the conditional standard deviations in $r_t$. Moreover, $B$ is the backshift operator and $\beta(B) = 1 - \sum_{j=1}^{q}\beta_j B^{j}$, where $\beta_j$, $j=1,2,\dots, q$, are real-valued coefficients. Furthermore, $\phi(B) = 1 - \sum_{i=1}^{p}\phi_i B^{i}$, where $\phi_i$, $i=1,2,\dots, p$, are real-valued coefficients. $p$ and $q$ are the model orders definable through the argument orders, where $p$ is the first element and $q$ is the second element in the argument. In addition, we have $\mu = E\left(r_t\right)$ as a real-valued parameter and $\gamma \in (-1,1)$ and $d \in [0,1]$ as the parameter for the level of integration. With $d = 0$ the model reduces to a short-memory APARCH, for $d=1$ we have a full integration, and for $d\in(0, 1)$, we have fractional integration, where $d\in(0, 0.5)$ is considered to describe a long-memory process. $\omega > 0$ is the intercept. It is assumed that all $\beta_j$ and $\phi_i$ are non-negative.

The pre-sample values of $\left(\left|\varepsilon_t\right|-\gamma\varepsilon_t\right)^{\delta}$ are replaced by the in-sample arithmetic mean of $$\left(\left|r_t^{*} - \bar{r}_t^{*}\right|- \gamma_0\left(r_t^{*}-\bar{r}_t^{*}\right)\right)^{\delta_0},$$ where $r_t^{*}$ are considered as the observations and $\bar{r}_t^{*}$ is the sample mean of $r_t^{*}$. Initial values for $\gamma_0$ and $\delta_0$ as initial guesses for $\gamma$ and $\delta$ are obtained from a previous fitting of a short-memory APARCH.

Currently, only a model of orders $p=1$ with $q=1$ can be fitted; to ensure the non-negativity of all of the infinite-order coefficient series $\left[1-\beta^{-1}(B)\phi(B)(1-B)^{d}\right]$, which in combination with $\omega>0$ ensures that all the conditional volatilities are greater than zero, we employ inequality constraints ensuring that the first 50 coefficients of the infinite-order ARCH-representation are non-negative as an approximation to ensuring that all of the coefficients are non-negative. To ensure that they are non-negative, one may in theory consider the sufficient conditions mentioned in Bollerslev and Mikkelsen (1996) or Tse (1998), which are however sometimes restrictive, or the simultaneously necessary and sufficient conditions by Conrad and Haag (2006), which are however complex to implement properly.

See also the reference section for sources on the APARCH (Ding et al., 1993) and the FIAPARCH (Tse, 1998) models.

The truncated infinite order polynomial is computed following the idea by Nielsen and Noel (2021) as is the series of conditional variances for most computational efficiency. To ensure stability of the first fitted in-sample conditional standard deviations, we however use a small, but also adjustable (also to length zero) presample, which may introduce biases into the parameter estimators.

In the current package version, standard errors of parameter estimates are computed from the Hessian at the optimum of the log-likelihood using hessian. To ensure numerical stability and applicability to a huge variety of differently scaled data, parametric models are first fitted to data that is scaled to have sample variance 1. Parameter estimates and other quantities are then either retransformed or recalculated afterwards for the original data.

For a conditional average Laplace distribution, an optimal model for each distribution parameter $P$ from 1 to 5 is estimated (assuming that $P$ is then fixed to the corresponding value). Afterwards, $P$ is then estimated by selecting the estimated model among the five fitted models that has the largest log-likelihood. The five models are, by default, fitted simultaneously using parallel programming techniques (see also the arguments parallel and ncores, which are only relevant for a conditional average Laplace distribution). After the optimal model (including the estimate of $P$ called $\hat{P}$) has been determined, $P=\hat{P}$ is seen as fixed to obtain the standard errors via the Hessian matrix for the estimates of the continuous parameters. A standard error for $\hat{P}$ is therefore not obtained and the ones obtained for the remaining estimates do not account for $\hat{P}$.

An ARMA-FIAPARCH or a FARIMA-FIAPARCH can be fitted by adjusting the argument meanspec correspondingly.

As an alternative, a semiparametric extension of the pure models in the conditional variance can be implemented. If use_nonpar = TRUE, meanspec is omitted and before fitting a zero-mean model in the conditional volatility following the remaining function arguments, a smooth scale function, i.e. a function representing the unconditional standard deviation over time, is being estimated following the specifications in nonparspec and control_nonpar. This preliminary step stabilizes the input series rt, as long-term changes in the unconditional variance are being estimated and removed before the parametric step using tsmoothlm. control_nonpar can be adjusted following to make changes to the arguments of tsmoothlm for long-memory specifications. These arguments specify settings for the automated bandwidth selection algorithms implemented by this function. By default, we use the settings pmin = 0, pmax = 1, qmin = 0, qmax = 1, InfR = "Nai", bStart = 0.15, cb = 0.05, and method = "lpr" for tsmoothlm. locpol_spec passed to nonparspec handles more direct settings of the local polynomial smoother itself. See the documentation for these functions to get a detailed overview of these settings. Assume $\{r_t\}$ to be the observed series, where $t = 1, 2, \dots, n$, then $r_t^{*} = r_t - \bar{r}$, with $\bar{r}$ being the arithmetic mean over the observed $r_t$, is computed and subsequently $y_t = \ln\left[\left(r_t^{*}\right)^2\right]$. The subtraction of $\bar{r}$ is necessary so that $r_t^{*}$ are all different from zero almost surely. Once $y_t$ are available, its trend $m(x_t)$, with $x_t$ as the rescaled time on the interval $[0, 1]$, is being estimated using tsmoothlm and denoted here by $\hat{m}(x_t)$. Then from $\hat{\xi}_t = y_t - \hat{m}(x_t)$ obtain $\hat{C} = -\ln\left\{\sum_{t=1}^{n}\exp\left(\hat{\xi}_t\right)\right\}$, and obtain the estimated scale function as $\hat{s}(x_t)=\exp\left[\left(\hat{\mu}(x_t) - \hat{C}\right) / 2\right]$. The stabilized / standardized version of the series $\left\{r_t\right\}$ is then $\tilde{r}_t = r_t^{*} / \hat{s}(x_t)$, to which a purely parametric volatility model following the remaining function arguments is then fitted. The estimated volatility at a given time point is then the product of the estimate of the corresponding scale function value and of the estimated conditional standard deviation (following the parametric model part) for that same time point. See for example Feng et al. (2022) or Letmathe et al. (2023) for more information on the semiparametric extension of volatility models.

The order for manual settings of start_pars, LB and UB is crucial. The correct order is: $\mu$, $\text{ar}_1,\dots,\text{ar}_{p^{*}}$, $\text{ma}_1,\dots,\text{ma}_{q^{*}}$,$D$,$\omega$, $\phi$, $\beta$, $\gamma$, $\delta$, $d$, shape parameter, skewness parameter. Depending on the exact model specification, parameters irrelevant for the specification at hand should be dropped in start_pars, LB and UB.

References

Bollerslev, T., & Mikkelsen, H. O. (1996). Modeling and pricing long memory in stock market volatility. Journal of Econometrics, 73(1): 151-184. DOI: 10.1016/0304-4076(95)01736-4.
Conrad, C., & Haag, B. R. (2006). Inequality constraints in the fractionally integrated GARCH model. Journal of Financial Econometrics, 4(3): 413-449. DOI: 10.1093/jjfinec/nbj015.
Ding, Z., Granger, C. W. J., & Engle, R. F. (1993). A long memory property of stock market returns and a new model. Journal of Empirical Finance, 1(1): 83-106. DOI: 10.1016/0927-5398(93)90006-D.
Feng, Y., Gries, T., Letmathe, S., & Schulz, D. (2022). The smoots Package in R for Semiparametric Modeling of Trend Stationary Time Series. The R Journal, 14(1), 182-195. URL: https://journal.r-project.org/articles/RJ-2022-017/.
Letmathe, S., Beran, J., & Feng, Y. (2023). An extended exponential SEMIFAR model with application in R. Communications in Statistics - Theory and Methods, 53(22), 7914–7926. DOI: 10.1080/03610926.2023.2276049.
Nielsen, M. O., & Noel, A. L. (2021). To infinity and beyond: Efficient computation of ARCH($\infty$) models. Journal of Time Series Analysis, 42(3), 338–354. DOI: 10.1111/jtsa.12570.
Tse, Y. K. (1998). The conditional heteroskedasticity of the yen-dollar exchange rate. Journal of Applied Econometrics, 13(1): 49-55. DOI: 10.1002/(SICI)1099-1255(199801/02)13:1<49::AID-JAE459>3.0.CO;2-O.

Examples

Run this code

# \donttest{
window.zoo <- get("window.zoo", envir = asNamespace("zoo"))
rt <- window.zoo(SP500, end = "2002-12-31")
model <- fiaparch(rt)
model
# }

Run the code above in your browser using DataLab