Takes a specification of the model and simulates the data from
that model. The model may be specified in terms of the individual
components of that model (the default method). The components
include a data frame that provides the predictor variables,
and various parameters of the model. For the "eglhmm"
method the model is specified as a fitted model, an object of
class "eglhmm".
reglhmm(x,...)
# S3 method for default
reglhmm(x, formula, response, cells=NULL, data=NULL, nobs=NULL,
distr=c("Gaussian","Poisson","Binomial","Dbd","Multinom"),
phi, Rho, sigma, size, ispd=NULL, ntop=NULL, zeta=NULL,
missFrac = 0, fep=NULL,
contrast=c("treatment","sum","helmert"),...)
# S3 method for eglhmm
reglhmm(x, missFrac = NULL, ...)A data frame with the same columns as those of data
and an added column, whose name is determined from formula,
containing the simulated response
For the default method, the transition probability matrix of
the hidden Markov chain. For the "eglhmm" method,
an object of class "eglhmm" as returned by the function
eglhmm().
The formula specifying the generalised linear model from which data
are to be simulated. Note that the predictor variables in
this formula must include a factor state, which specifies
the state of the hidden Markov chain. Note also that this formula
must determine a design matrix having a number of columns equal to
the length of the vector phi of model coefficients provided
in object (and to the length of psi in the case of
the Gaussian distribution). If this condition is not satisfied,
an error is thrown.
It is advisable to use a formula specified in the manner
y~0+state+... where ... represents the predictors
in the model other than state. Of course phi must
be supplied in a manner that is consistent with this structure.
A character vector of length 2, specifying
the names of the responses. Ignored unless distr is
"Multinom". If distr is "Multinom" and if
response is provided appropriately, then the simulated
data are bivariate multinomial.
A character vector specifying the names of the factors which
determine the ``cells'' of the model. These factors must be
columns of the data frame data. (See below.) Each cell
corresponds to a time series of (simulated) observations.
If cells is not supplied (left equal to NULL)
then the model is taken to have a single cell, i.e. data from a
“simple” hidden Markov model is generated. The parameters
of that model may be time-varying, and still depend on the
predictors specified by formula.
A data frame containing the predictor variables referred to by
formula, i.e. the predictors for the model from which
data are to be simulated. If data is not specified,
the nobs (see below) must be. If data is not
specified then formula must have the structure y ~
state or preferably y ~ 0 + state. Of course phi
must be specified in a consistent manner.
Integer scalar. The number of observations to be generated in
the setting in which the generalised linear model in question is
vacuous. Ignored if data is supplied.
Character string specifying the distribution of the “emissions” from the model, i.e., of the observations. This distribution determines “emission probabilities”.
A numeric vector specifying the coefficients of the linear
predictor of the generalised linear model. The length of
phi must be equal to the number of columns of the
design matrix determined by formula and data.
The entries of phi must match up appropriately with
the columns of the design matrix.
A matrix, or a list of two matrices or a three dimensional
array specifying the emissions probabilities for a multinomial
distribution. Ignored unless distr is "Multinomial".
A numeric vector of length equal to the number of states.
Its \(i\)th entry is the standard deviation of the (Gaussian)
distribution corresponding to the \(i\)th state. Ignored unless
distr is "Gaussian".
Integer scalar. The number of trials (sample size) from which
the number of “successes” are counted, in the context of
the binomial distribution. (I.e. the size parameter of
rbinom().) Ignored unless distr is "Binomial".
An optional numeric vector specifying the initial state probability
distribution of the model. If ispd is not provided then it
is taken to be the stationary/steady state distribution determined
by the transition probability matrix x. If specified,
ispd must be a probability vector of length equal
to the number of rows (equivalently the number of columns)
of x.
Integer scalar, strictly greater than 1. The maximum possible
value of the db distribution. See db().
Used only if distr is "Dbd".
Logical scalar. Should zero origin indexing be used?
I.e. should the range of values of the db distribution be taken to
be {0,1,2,...,ntop} rather than {1,2,...,ntop}?
Used only if distr is "Dbd".
A non-negative scalar, less than 1. Data will be randomly set
equal to NA with probability miss.frac. Note that
for the "eglhmm" method, if "miss.frac" is not
supplied then it is extracted from object
A list of length 1 or 2. The first entry of this
list is a logical scalar. If this is TRUE, then the first
entry of the simulated emissions (or at least one entry of the first
pair of simulated emissions) is forced to be “present”,
i.e. non-missing. The second entry of fep, if present, is
a numeric scalar, between 0 and 1 (i.e. a probability). It
is equal to the probability that both entries of the first
pair of emissions are present. It is ignored if the emissions
are univariate. If the emissions are bivariate but the second
entry of fep is not provided, then this second entry defaults
to the “overall” probability that both entries of a pair of
emission are present, given that at least on is present.
This probability is calculated from nafrac.
A character string, one of ``treatment'', ``helmert'' or ``sum'',
specifying what contrast (for unordered factors) to use in
constructing the design matrix. (The contrast for ordered factors,
which is has no relevance in this context, is left at it default
value of "contr.poly".) Note that the meaning of the
coefficient vector phi depends on the contrast specified,
so make sure that the contrast is the same as what you had in
mind when you specified phi!!! Note that for the "eglhmm"
method, contrast is extracted from x.
Not used.
Although this documentation refers to “generalised linear models”, the only such models currently (format(Sys.Date(),"%d/%m/%Y")) available are the Gaussian model with the identity link, the Poisson model, with the log link, and the Binomial model with the logit link. The Multinomial model, which is also available, is not exactly a generalised linear model; it might be thought of as an “extended” generalised linear model. Other models may be added at a future date.
Rolf Turner rolfturner@posteo.net
T. Rolf Turner, Murray A. Cameron, and Peter J. Thomson (1998). Hidden Markov chains in generalized linear models. Canadian Journal of Statististics 26, pp. 107 -- 125, DOI: https://doi.org/10.2307/3315677.
Rolf Turner (2008). Direct maximization of the likelihood of a hidden Markov model. Computational Statistics and Data Analysis 52, pp. 4147 -- 4160, DOI: https://doi.org/10.1016/j.csda.2008.01.029
fitted.eglhmm()
bcov()
loc4 <- c("LngRf","BondiE","BondiOff","MlbrOff")
SCC4 <- SydColCount[SydColCount$locn %in% loc4,]
SCC4$locn <- factor(SCC4$locn) # Get rid of unused levels.
rownames(SCC4) <- 1:nrow(SCC4)
Tpm <- matrix(c(0.91,0.09,0.36,0.64),byrow=TRUE,ncol=2)
Phi <- c(0,log(5),-0.34,0.03,-0.32,0.14,-0.05,-0.14)
# The "state effects" are 1 and 5.
Dat <- SCC4[,1:3]
fmla <- y~0+state+locn+depth
cells <- c("locn","depth")
# The default method.
X <- reglhmm(Tpm,formula=fmla,cells=cells,data=Dat,distr="P",phi=Phi,
miss.frac=0.75,contrast="sum")
# The "eglhmm" method.
fit <- eglhmm(y~locn+depth,data=SCC4,cells=cells,K=2,
verb=TRUE,distr="P")
Y <- reglhmm(fit)
# Vacuous generalised linear model.
Z <- reglhmm(Tpm,formula=y~0+state,nobs=300,distr="P",phi=log(c(2,7)))
# The "state effects" are 2 and 7.
Run the code above in your browser using DataLab