EM_multinomial: Helper function for npmsm()

Description

For a general Markov chain multi-state model with interval censored transitions calculate the NPMLE using an EM algorithm with multinomial approach

Usage

EM_multinomial(
  gd,
  tmat,
  tmat2,
  inits,
  beta_params,
  support_manual,
  exact,
  maxit,
  tol,
  conv_crit,
  manual,
  verbose,
  newmet,
  include_inf,
  checkMLE,
  checkMLE_tol,
  prob_tol,
  remove_bins,
  init_int = init_int,
  ...
)

Arguments

gd

A data.frame with the following named columns

id:: Subject idenitifier;

state:

State at which the subject is observed at time;

time:

Time at which the subject is observed;

The true transition time between states is then interval censored between the times.

tmat

A transition matrix as created by transMat

inits

Which distribution should be used to generate the initial estimates of the intensities in the EM algorithm. One of c("equalprob", "unif", "beta"), with "equalprob" assigning 1/K to each intensity, with K the number of distinct observation times (length(unique(gd[, "time"]))). For "unif", each intensity is sampled from the Unif[0,1] distribution and for "beta" each intensity is sampled from the Beta(a, b) distribution. If "beta" is chosen, the argument beta_params must be specified as a vector of length 2 containing the parameters of the beta distribution. Default = "equalprob".

beta_params

A vector of length 2 specifying the beta distribution parameters for initial distribution generation. First entry will be used as shape1 and second entry as shape2. See help(rbeta). Only used if inits = "beta".

support_manual

Used for specifying a manual support region for the transitions. A list of length the number of transitions in tmat, each list element containing a data frame with 2 named columns L and R indicating the left and right values of the support intervals. When specified, all intensities outside of these intervals will be set to zero for the corresponding transitions. Intensities set to zero cannot be changed by the EM algorithm. Will use inits = "equalprob".

exact

Numeric vector indicating to which states transitions are observed at exact times. Must coincide with the column number in tmat.

maxit

Maximum number of iterations.

tol

Tolerance of the procedure.

conv_crit

Convergence criterion. Stops procedure when the difference in the chosen quantity between two consecutive iterations is smaller than the tolerance level tol. One of the following:

"haz": Stop when change in maximum estimated intensities (hazards) < tol.

"prob"

Stop when change in estimated probabilities < tol.

"lik"

Stop when change in observed-data likelihood < tol.

Default is "haz". The options "haz" and "lik" can be compared across different methods, but "prob" is dependent on the chosen method. Most conservative (requiring most iterations) is "prob", followed by "haz" and finally "lik".

manual

Manually specify starting transition intensities?

verbose

Should iteration messages be printed? Default is FALSE

newmet

Should contributions after last observation time also be used in the likelihood? Default is FALSE.

include_inf

Should an additional bin from the largest observed time to infinity be included in the algorithm? Default is FALSE.

checkMLE

Should a check be performed whether the estimate has converged towards a true Maximum Likelihood Estimate? Default is TRUE.

checkMLE_tol

Tolerance for checking whether the estimate has converged to MLE. Whenever an estimated transition intensity is smaller than the tolerance, it is assumed to be zero.

prob_tol

If an estimated probability is smaller than prob_tol, it will be set to zero during estimation. Default value is tol/10.

remove_bins

Should a bin be removed during the algorithm if all estimated intensities are zero for a single bin? Can improve computation speed for large data sets. Note that zero means the estimated intensities are smaller than prob_tol. Default is FALSE.

init_int

A vector of length 2, with the first entry indicating what percentage of mass should be distributed over (second entry) what percentage of all first bins. Default is c(0, 0), in which case the argument is ignored. This argument has no practical uses and only exists for demonstration purposes in the related article.

...

Not used yet

References

Michael G. Hudgens, On Nonparametric Maximum Likelihood Estimation with Interval Censoring and Left Truncation, Journal of the Royal Statistical Society Series B: Statistical Methodology, Volume 67, Issue 4, September 2005, Pages 573-587, tools:::Rd_expr_doi("10.1111/j.1467-9868.2005.00516.x")