reglhmm: Simulate data from a hidden generalised linear Markov model.

Description

Takes a specification of the model and simulates the data from that model. The model may be specified in terms of the individual components of that model (the default method). The components include a data frame that provides the predictor variables, and various parameters of the model. For the "eglhmm" method the model is specified as a fitted model, an object of class "eglhmm".

Usage

reglhmm(x,...)
# S3 method for default
reglhmm(x, formula, response, cells=NULL, data=NULL, nobs=NULL,
                         distr=c("Gaussian","Poisson","Binomial","Dbd","Multinom"),
                         phi, Rho, sigma, size, ispd=NULL, ntop=NULL, zeta=NULL,
                         missFrac = 0, fep=NULL,
                         contrast=c("treatment","sum","helmert"),...)
# S3 method for eglhmm
reglhmm(x, missFrac = NULL, ...)

Value

A data frame with the same columns as those of data

and an added column, whose name is determined from formula, containing the simulated response

Arguments

x

For the default method, the transition probability matrix of the hidden Markov chain. For the "eglhmm" method, an object of class "eglhmm" as returned by the function eglhmm().

formula

The formula specifying the generalised linear model from which data are to be simulated. Note that the predictor variables in this formula must include a factor state, which specifies the state of the hidden Markov chain. Note also that this formula must determine a design matrix having a number of columns equal to the length of the vector phi of model coefficients provided in object (and to the length of psi in the case of the Gaussian distribution). If this condition is not satisfied, an error is thrown.

It is advisable to use a formula specified in the manner y~0+state+... where ... represents the predictors in the model other than state. Of course phi must be supplied in a manner that is consistent with this structure.

response

A character vector of length 2, specifying the names of the responses. Ignored unless distr is "Multinom". If distr is "Multinom" and if response is provided appropriately, then the simulated data are bivariate multinomial.

cells

A character vector specifying the names of the factors which determine the ``cells'' of the model. These factors must be columns of the data frame data. (See below.) Each cell corresponds to a time series of (simulated) observations. If cells is not supplied (left equal to NULL) then the model is taken to have a single cell, i.e. data from a “simple” hidden Markov model is generated. The parameters of that model may be time-varying, and still depend on the predictors specified by formula.

data

A data frame containing the predictor variables referred to by formula, i.e. the predictors for the model from which data are to be simulated. If data is not specified, the nobs (see below) must be. If data is not specified then formula must have the structure y ~ state or preferably y ~ 0 + state. Of course phi must be specified in a consistent manner.

nobs

Integer scalar. The number of observations to be generated in the setting in which the generalised linear model in question is vacuous. Ignored if data is supplied.

distr

Character string specifying the distribution of the “emissions” from the model, i.e., of the observations. This distribution determines “emission probabilities”.

phi

A numeric vector specifying the coefficients of the linear predictor of the generalised linear model. The length of phi must be equal to the number of columns of the design matrix determined by formula and data. The entries of phi must match up appropriately with the columns of the design matrix.

Rho

A matrix, or a list of two matrices or a three dimensional array specifying the emissions probabilities for a multinomial distribution. Ignored unless distr is "Multinomial".

sigma

A numeric vector of length equal to the number of states. Its \(i\)th entry is the standard deviation of the (Gaussian) distribution corresponding to the \(i\)th state. Ignored unless distr is "Gaussian".

size

Integer scalar. The number of trials (sample size) from which the number of “successes” are counted, in the context of the binomial distribution. (I.e. the size parameter of rbinom().) Ignored unless distr is "Binomial".

ispd

An optional numeric vector specifying the initial state probability distribution of the model. If ispd is not provided then it is taken to be the stationary/steady state distribution determined by the transition probability matrix x. If specified, ispd must be a probability vector of length equal to the number of rows (equivalently the number of columns) of x.

ntop

Integer scalar, strictly greater than 1. The maximum possible value of the db distribution. See db(). Used only if distr is "Dbd".

zeta

Logical scalar. Should zero origin indexing be used? I.e. should the range of values of the db distribution be taken to be {0,1,2,...,ntop} rather than {1,2,...,ntop}? Used only if distr is "Dbd".

missFrac

A non-negative scalar, less than 1. Data will be randomly set equal to NA with probability miss.frac. Note that for the "eglhmm" method, if "miss.frac" is not supplied then it is extracted from object

fep

A list of length 1 or 2. The first entry of this list is a logical scalar. If this is TRUE, then the first entry of the simulated emissions (or at least one entry of the first pair of simulated emissions) is forced to be “present”, i.e. non-missing. The second entry of fep, if present, is a numeric scalar, between 0 and 1 (i.e. a probability). It is equal to the probability that both entries of the first pair of emissions are present. It is ignored if the emissions are univariate. If the emissions are bivariate but the second entry of fep is not provided, then this second entry defaults to the “overall” probability that both entries of a pair of emission are present, given that at least on is present. This probability is calculated from nafrac.

contrast

A character string, one of ``treatment'', ``helmert'' or ``sum'', specifying what contrast (for unordered factors) to use in constructing the design matrix. (The contrast for ordered factors, which is has no relevance in this context, is left at it default value of "contr.poly".) Note that the meaning of the coefficient vector phi depends on the contrast specified, so make sure that the contrast is the same as what you had in mind when you specified phi!!! Note that for the "eglhmm" method, contrast is extracted from x.

...

Not used.

Remark

Although this documentation refers to “generalised linear models”, the only such models currently (format(Sys.Date(),"%d/%m/%Y")) available are the Gaussian model with the identity link, the Poisson model, with the log link, and the Binomial model with the logit link. The Multinomial model, which is also available, is not exactly a generalised linear model; it might be thought of as an “extended” generalised linear model. Other models may be added at a future date.

Author

Rolf Turner rolfturner@posteo.net

References

T. Rolf Turner, Murray A. Cameron, and Peter J. Thomson (1998). Hidden Markov chains in generalized linear models. Canadian Journal of Statististics 26, pp. 107 -- 125, DOI: https://doi.org/10.2307/3315677.

Rolf Turner (2008). Direct maximization of the likelihood of a hidden Markov model. Computational Statistics and Data Analysis 52, pp. 4147 -- 4160, DOI: https://doi.org/10.1016/j.csda.2008.01.029

Examples

Run this code

    loc4 <- c("LngRf","BondiE","BondiOff","MlbrOff")
    SCC4 <- SydColCount[SydColCount$locn %in% loc4,] 
    SCC4$locn <- factor(SCC4$locn) # Get rid of unused levels.
    rownames(SCC4) <- 1:nrow(SCC4)
    Tpm   <- matrix(c(0.91,0.09,0.36,0.64),byrow=TRUE,ncol=2)
    Phi   <- c(0,log(5),-0.34,0.03,-0.32,0.14,-0.05,-0.14)
    # The "state effects" are 1 and 5.
    Dat   <- SCC4[,1:3]
    fmla  <- y~0+state+locn+depth
    cells <- c("locn","depth")
# The default method.
    X     <- reglhmm(Tpm,formula=fmla,cells=cells,data=Dat,distr="P",phi=Phi,
                    miss.frac=0.75,contrast="sum")
# The "eglhmm" method.
    fit <- eglhmm(y~locn+depth,data=SCC4,cells=cells,K=2,
                 verb=TRUE,distr="P")
    Y   <- reglhmm(fit)
# Vacuous generalised linear model.
    Z   <- reglhmm(Tpm,formula=y~0+state,nobs=300,distr="P",phi=log(c(2,7)))
    # The "state effects" are 2 and 7.

Run the code above in your browser using DataLab