misstify: Insert missing values.

Description

Insert missing values into data simulated by rhmm.

Usage

misstify(y, nafrac, fep = NULL)

Value

An object with a structure similar to that of y, containing the same data as y but with some of these data having been replaced by missing values (NA). In particular, if y

is of class "multipleHmmDataSets" then so is the returned value.

Note that rhmm() calls upon misstify() to effect the replacement of a certain fraction of the simulated observations by missing values. If rhmm() is applied to a fitted model, then by default, this “certain fraction” is determined, using nafracCalc(), from the data set to which the model was fitted.

Arguments

y

A data set (vector or matrix with one or two columns, whose entries consitute discrete data, or a list of such vectors or matrices) or a list of such data sets (objects of class "multipleHmmDataSets" such as might be generated by rhmm()

nafrac

A numeric vector, some entries of which could be ignored. (See below.) Those which do not get ignored must be probabilities strictly less than 1. (Having everything missing makes no sense!)

The vector nafrac will be replicated to have an “appropriate” length. If y is of class "multipleHmmDataSets" then this length is length(y) if the data are univariate and is 2*length(y) if the data are bivariate. In the former case the entries of the replicated vector from the fraction of missing values in the corresponding data set. In the latter case the odd numbered entries form the fraction of missing values for the first variable and the even numbered entries the fraction for the second variable. If y is not of class "multipleHmmDataSets" then this length is either 1 (univariate case) or 2 (bivariate case).

Note that replication discards entries that are not needed to make up the required length, and such entries are thereby ignored. E.g. rep(c(0.2,0.7,1.6),length=2) yields [1] 0.2 0.7, i.e. the entry 1.6 is ignored.

The fraction(s) of missing values in a given data set may be determined by nafracCalc().

fep

“First entry present”. A list with one or two entries, the first being a logical scalar (which might be named "present". If there is a second entry it should be a scalar probability (which might be named "p2"). In an application of interest, observation sequences always begin at an observed event, i.e. at a time point at which the “emission” has at least one non-missing value. If fep[[1]] is TRUE the NAs will be inserted in such a way that the resulting data have this characteristic. If fep is left NULL then its first (possibly only) entry is set to TRUE.

For bivariate data, fep[[2]] specifies the probabilty that both values of the initial pair of observations are non-missing. In this case one of the entries of the initial pair is chosen to be “potentially” missing, with probabilities nafrac/sum(nafrac). This entry is left non-missing with probability fep[[2]]. (The other entry is always left non-missing.)

If the data are univariate or if fep[[1]] is FALSE, then fep[[2]] is ignored. If the data are bivariate and fep[[2]] is not specified, it defaults to the (estimated) conditional probability that both entries of the initial pair of observations are present given that at least one is present, under the assumption of independence of these events. I.e. it is set equal to prod(1-nafrac)/(1-prod(1-nafrac)).

Author

Rolf Turner r.turner@auckland.ac.nz

Examples

Run this code

P <- matrix(c(0.7,0.3,0.1,0.9),2,2,byrow=TRUE)
R <- matrix(c(0.5,0,0.1,0.1,0.3,
              0.1,0.1,0,0.3,0.5),5,2)
set.seed(42)
lll   <- sample(250:350,20,TRUE)
y1    <- rhmm(ylengths=lll,nsim=1,tpm=P,Rho=R)
y1m   <- misstify(y1,nafrac=0.5,fep=list(TRUE))
y2    <- rhmm(ylengths=lll,nsim=5,tpm=P,Rho=R)
set.seed(127)
y2m   <- misstify(y2,nafrac=0.5,fep=list(TRUE))
nafracCalc(y2m) # A list all of whose entries are close to 0.5.
set.seed(127)
y2ma  <- lapply(y2,misstify,nafrac=0.5,fep=list(TRUE))
if (FALSE) {
    nafracCalc(y2ma) # Throws an error.
}
sapply(y2ma,nafracCalc) # Effectively the same as nafracCalc(y2m).