hdMTD_BIC: The Bayesian Information Criterion (BIC) method for inference in MTD models

Description

A function for estimating the relevant lag set $\Lambda$ of a Markov chain using Bayesian Information Criterion (BIC). This means that this method selects the set of lags that minimizes a penalized log likelihood for a given sample, see References below for details on the method.

Usage

hdMTD_BIC(
  X,
  d,
  S = seq_len(d),
  minl = 1,
  maxl = length(S),
  xi = 1/2,
  A = NULL,
  byl = FALSE,
  BICvalue = FALSE,
  single_matrix = FALSE,
  indep_part = TRUE,
  zeta = maxl,
  warn = FALSE,
  ...
)

Value

Returns a vector with the estimated relevant lag set using BIC. It might return more than one set if minl < maxl and byl = TRUE. Additionally, it can return the value of the penalized likelihood for the outputted lag sets if BICvalue = TRUE.

Arguments

X: A vector or single-column data frame containing a chain sample (X[1] is the most recent).
d: A positive integer representing an upper bound for the chain order.
S: A numeric vector of positive integers from which this function will select a set of relevant lags. Typically, S is a subset of 1:d. If S is not provided, by default S=1:d.
minl: A positive integer. minl represents the smallest length of any relevant lag set this function might return. If minl == maxl, this function will return the subset of S of length minl with the lowest BIC. If minl < maxl, the function will consider subsets ranging from length minl to length maxl when searching for the subset of S with the smallest BIC.
maxl: A positive integer equal to or greater than minl but less than the number of elements in S (maxl = length(S) is accepted but in this case the output will always be S). maxl represents the largest length of any relevant lag set this function might return.
xi: The BIC penalization term constant. Defaulted to 1/2. A smaller xi (near 0) reduces the impact of overparameterization.
A: A vector with positive integers representing the state space. If not informed, this function will set A=sort(unique(X)).
byl: Logical. If TRUE, the function will look for the set with smallest BIC by each length (from minl to maxl), and return the set with smallest BIC for each length. If minl==maxl setting byl=TRUE or FALSE makes no difference, since the function will only calculate the BIC for sets with maxl elements in the relevant lag set.
BICvalue: Logical. If TRUE, the function will also return the calculated values of the BIC for the estimated relevant lag sets.
single_matrix: Logical. If TRUE, the chain sample is thought to come from an MTD model where the stochastic matrices $p_j$ are constant across all lags $j\in \Lambda$. In practice, this means the user believes the stochastic matrices for every lag in S are the same, which reduces the number of parameters in the penalization term.
indep_part: Logical. If FALSE there is no independent distribution and $\lambda_0=0$ which reduces the number of parameters in the penalization term.
zeta: A positive integer representing the number of distinct matrices $p_j$ in the MTD, which affects the number of parameters in the penalization term. Defaulted to maxl. See more in Details.
warn: Logical. If TRUE, the function warns the user when A is set automatically.
...: Additional arguments (not used in this function, but maintained for compatibility with hdMTD().

Details

Criterion. For each candidate lag set $T$ contained in $S$ with size $l = |T|$ where minl <= l <= maxl, hdMTD_BIC() evaluates $$BIC(T) = - L_T + xi * df(T) * log(N),$$ where $N = length(X)$ and $$L_T = \sum_{x_T \in A^T} \sum_{a \in A} N(a x_T) * log( \hat{p}(a | x_T) ).$$ The empirical conditionals are $$\hat{p}(a | x_T) = N(a x_T) / N(x_T),$$ computed from the sample counts (same quantities returned by freqTab and empirical_probs).

Degrees of freedom. The parameter count $df(T)$ is the number of free parameters of an MTD model with lag set $T$ and state space $A$, honoring the constraints single_matrix, indep_part, and zeta: $$df(T) = w_{df} + p0_{df} + |A| * (|A| - 1) * zeta.$$ Here $zeta$ is the number of distinct $p_j$ matrices allowed across lags (by default $zeta = l$; setting single_matrix = TRUE forces $zeta = 1$). The weight and independent-part contributions are: $w_{df} = l$ if indep_part is TRUE, otherwise $w_{df} = l - 1$; $p0_{df} = |A| - 1$ if indep_part is TRUE, otherwise $p0_{df} = 0$.

Scale. With xi = 1/2 (the default), $BIC$ equals one half of the classical Schwarz BIC $-2 * L_T + df(T) * log(N)$; minimizing either criterion selects the same lag set.

Note. The likelihood term $L_T$ sums over the $N - max(T)$ effective transitions, while the penalty uses $log(N)$ (this matches the implementation).

Note that the upper bound for the order of the chain (d) affects the estimation of the transition probabilities. If we run the function with a certain order parameter d, only the sequences of length d that appeared in the sample will be counted. Therefore, all transition probabilities, and hence all BIC values, will be calculated with respect to that d. If we use another value for d to run the function, even if the output agrees with that of the previous run, its BIC value might change a little.

The parameter zeta indicates the the number of distinct matrices pj in the MTD. If zeta = 1, all matrices $p_j$ are identical; if zeta = 2 there exists two groups of distinct matrices and so on. The largest value for zeta is maxl since this is the largest number of matrices $p_j$. When minl<maxl, for each minl $\leq$ l $\leq$ maxl, zeta = min(zeta,l). If single_matrix = TRUE then zeta is set to 1.

References

Imre Csiszár, Paul C. Shields. The consistency of the BIC Markov order estimator. The Annals of Statistics, 28(6), 1601-1619. tools:::Rd_expr_doi("10.1214/aos/1015957472")

Examples

Run this code

# Simulate a chain from an MTD model
set.seed(1)
M <- MTDmodel(Lambda = c(1, 3), A = c(1, 2), lam0 = 0.05)
X <- perfectSample(M, N = 400)

# Fit using BIC with a single lag
hdMTD_BIC(X, d = 6, minl = 1, maxl = 1)

# Fit using BIC with lag selection and extract BIC value
hdMTD_BIC(X, d = 3, minl = 1, maxl = 2, BICvalue = TRUE)

Run the code above in your browser using DataLab