A function for estimating the relevant lag set \(\Lambda\) of a Markov chain using Bayesian Information Criterion (BIC). This means that this method selects the set of lags that minimizes a penalized log likelihood for a given sample, see References below for details on the method.
hdMTD_BIC(
X,
d,
S = seq_len(d),
minl = 1,
maxl = length(S),
xi = 1/2,
A = NULL,
byl = FALSE,
BICvalue = FALSE,
single_matrix = FALSE,
indep_part = TRUE,
zeta = maxl,
warning = FALSE,
...
)
Returns a vector with the estimated relevant lag set using BIC. It might return more
than one set if minl < maxl
and byl = TRUE
. Additionally, it can return the value
of the penalized likelihood for the outputted lag sets if BICvalue = TRUE
.
A vector or single-column data frame containing a chain sample (X[1]
is the most recent).
A positive integer representing an upper bound for the chain order.
A numeric vector of positive integers from which this function will select
a set of relevant lags. Typically, S
is a subset of 1:d
. If S
is not provided, by default S=1:d
.
A positive integer. minl
represents the smallest length of any relevant lag
set this function might return. If minl == maxl
, this function will return the
subset of S
of length minl
with the lowest BIC. If minl < maxl
, the function
will consider subsets ranging from length minl
to length maxl
when searching for
the subset of S
with the smallest BIC.
A positive integer equal to or greater than minl
but less than the number
of elements in S
(maxl = length(S)
is accepted but in this case the output will
always be S
). maxl
represents the largest length of any relevant lag set this
function might return.
The BIC penalization term constant. Defaulted to 1/2. A smaller xi
(near 0)
reduces the impact of overparameterization.
A vector with positive integers representing the state space. If not informed,
this function will set A=sort(unique(X))
.
Logical. If TRUE
, the function will look for the set with smallest BIC by each
length (from minl
to maxl
), and return the set with smallest BIC for each length.
If minl==maxl
setting byl=TRUE
or FALSE
makes no difference, since the
function will only calculate the BIC for sets with maxl
elements in the relevant lag set.
Logical. If TRUE
, the function will also return the calculated values of
the BIC for the estimated relevant lag sets.
Logical. If TRUE
, the chain sample is thought to come from an MTD model
where the stochastic matrices \(p_j\) are constant across all lags \(j\in \Lambda\). In practice,
this means the user believes the stochastic matrices for every lag in S
are the same, which reduces
the number of parameters in the penalization term.
Logical. If FALSE
there is no independent distribution and \(\lambda_0=0\) which
reduces the number of parameters in the penalization term.
A positive integer representing the number of distinct matrices \(p_j\)
in the MTD, which affects the number of parameters in the penalization term. Defaulted
to maxl
. See more in Details.
Logical. If TRUE
, the function warns the user when A
is set automatically.
Additional arguments (not used in this function, but maintained for compatibility with hdMTD()
.
Note that the upper bound for the order of the chain (d
) affects the estimation
of the transition probabilities. If we run the function with a certain order parameter d
,
only the sequences of length d
that appeared in the sample will be counted. Therefore,
all transition probabilities, and hence all BIC values, will be calculated with respect to
that d
. If we use another value for d
to run the function, even if the output
agrees with that of the previous run, its BIC value might change a little.
The parameter zeta
indicates the the number of distinct matrices pj in the MTD.
If zeta = 1
, all matrices \(p_j\) are identical; if zeta = 2
there exists
two groups of distinct matrices and so on. The largest value for zeta
is maxl
since this is the largest number of matrices \(p_j\). When minl<maxl
,
for each minl
\(\leq\) l
\(\leq\) maxl
, zeta = min(zeta,l)
.
If single_matrix = TRUE
then zeta
is set to 1.
Imre Csiszár, Paul C. Shields. The consistency of the BIC Markov order estimator. The Annals of Statistics, 28(6), 1601-1619. tools:::Rd_expr_doi("10.1214/aos/1015957472")
X <- testChains[, 1]
hdMTD_BIC (X, d = 6, minl = 1, maxl = 1)
hdMTD_BIC (X,d = 3,minl = 1, maxl = 2, BICvalue = TRUE)
Run the code above in your browser using DataLab