A function for estimating the relevant lag set \(\Lambda\) of a Markov chain using Bayesian Information Criterion (BIC). This means that this method selects the set of lags that minimizes a penalized log likelihood for a given sample, see References below for details on the method.
hdMTD_BIC(
X,
d,
S = seq_len(d),
minl = 1,
maxl = length(S),
xi = 1/2,
A = NULL,
byl = FALSE,
BICvalue = FALSE,
single_matrix = FALSE,
indep_part = TRUE,
zeta = maxl,
warn = FALSE,
...
)Returns a vector with the estimated relevant lag set using BIC. It might return more
than one set if minl < maxl and byl = TRUE. Additionally, it can return the value
of the penalized likelihood for the outputted lag sets if BICvalue = TRUE.
A vector or single-column data frame containing a chain sample (X[1] is the most recent).
A positive integer representing an upper bound for the chain order.
A numeric vector of positive integers from which this function will select
a set of relevant lags. Typically, S is a subset of 1:d. If S
is not provided, by default S=1:d.
A positive integer. minl represents the smallest length of any relevant lag
set this function might return. If minl == maxl, this function will return the
subset of S of length minl with the lowest BIC. If minl < maxl, the function
will consider subsets ranging from length minl to length maxl when searching for
the subset of S with the smallest BIC.
A positive integer equal to or greater than minl but less than the number
of elements in S (maxl = length(S) is accepted but in this case the output will
always be S). maxl represents the largest length of any relevant lag set this
function might return.
The BIC penalization term constant. Defaulted to 1/2. A smaller xi (near 0)
reduces the impact of overparameterization.
A vector with positive integers representing the state space. If not informed,
this function will set A=sort(unique(X)).
Logical. If TRUE, the function will look for the set with smallest BIC by each
length (from minl to maxl), and return the set with smallest BIC for each length.
If minl==maxl setting byl=TRUE or FALSE makes no difference, since the
function will only calculate the BIC for sets with maxl elements in the relevant lag set.
Logical. If TRUE, the function will also return the calculated values of
the BIC for the estimated relevant lag sets.
Logical. If TRUE, the chain sample is thought to come from an MTD model
where the stochastic matrices \(p_j\) are constant across all lags \(j\in \Lambda\). In practice,
this means the user believes the stochastic matrices for every lag in S are the same, which reduces
the number of parameters in the penalization term.
Logical. If FALSE there is no independent distribution and \(\lambda_0=0\) which
reduces the number of parameters in the penalization term.
A positive integer representing the number of distinct matrices \(p_j\)
in the MTD, which affects the number of parameters in the penalization term. Defaulted
to maxl. See more in Details.
Logical. If TRUE, the function warns the user when A is set automatically.
Additional arguments (not used in this function, but maintained for compatibility with hdMTD().
Criterion. For each candidate lag set \(T\) contained in \(S\) with
size \(l = |T|\) where minl <= l <= maxl, hdMTD_BIC() evaluates
$$BIC(T) = - L_T + xi * df(T) * log(N),$$
where \(N = length(X)\) and
$$L_T = \sum_{x_T \in A^T} \sum_{a \in A} N(a x_T) * log( \hat{p}(a | x_T) ).$$
The empirical conditionals are
$$\hat{p}(a | x_T) = N(a x_T) / N(x_T),$$
computed from the sample counts (same quantities returned by
freqTab and empirical_probs).
Degrees of freedom. The parameter count \(df(T)\) is the number of free
parameters of an MTD model with lag set \(T\) and state space \(A\), honoring the
constraints single_matrix, indep_part, and zeta:
$$df(T) = w_{df} + p0_{df} + |A| * (|A| - 1) * zeta.$$
Here \(zeta\) is the number of distinct \(p_j\) matrices allowed across lags
(by default \(zeta = l\); setting single_matrix = TRUE forces \(zeta = 1\)).
The weight and independent-part contributions are:
\(w_{df} = l\) if indep_part is TRUE, otherwise \(w_{df} = l - 1\);
\(p0_{df} = |A| - 1\) if indep_part is TRUE, otherwise \(p0_{df} = 0\).
Scale. With xi = 1/2 (the default), \(BIC\) equals one half of the
classical Schwarz BIC \(-2 * L_T + df(T) * log(N)\); minimizing either criterion selects
the same lag set.
Note. The likelihood term \(L_T\) sums over the \(N - max(T)\) effective transitions, while the penalty uses \(log(N)\) (this matches the implementation).
Note that the upper bound for the order of the chain (d) affects the estimation
of the transition probabilities. If we run the function with a certain order parameter d,
only the sequences of length d that appeared in the sample will be counted. Therefore,
all transition probabilities, and hence all BIC values, will be calculated with respect to
that d. If we use another value for d to run the function, even if the output
agrees with that of the previous run, its BIC value might change a little.
The parameter zeta indicates the the number of distinct matrices pj in the MTD.
If zeta = 1, all matrices \(p_j\) are identical; if zeta = 2 there exists
two groups of distinct matrices and so on. The largest value for zeta is maxl
since this is the largest number of matrices \(p_j\). When minl<maxl,
for each minl \(\leq\) l \(\leq\) maxl, zeta = min(zeta,l).
If single_matrix = TRUE then zeta is set to 1.
Imre Csiszár, Paul C. Shields. The consistency of the BIC Markov order estimator. The Annals of Statistics, 28(6), 1601-1619. tools:::Rd_expr_doi("10.1214/aos/1015957472")
# Simulate a chain from an MTD model
set.seed(1)
M <- MTDmodel(Lambda = c(1, 3), A = c(1, 2), lam0 = 0.05)
X <- perfectSample(M, N = 400)
# Fit using BIC with a single lag
hdMTD_BIC(X, d = 6, minl = 1, maxl = 1)
# Fit using BIC with lag selection and extract BIC value
hdMTD_BIC(X, d = 3, minl = 1, maxl = 2, BICvalue = TRUE)
Run the code above in your browser using DataLab