rmult.bcl: Simulating Correlated Nominal Responses Conditional on a Marginal Baseline-Category Logit Model Specification

Description

Simulates correlated nominal responses assuming a baseline-category logit model for the marginal probabilities.

Usage

rmult.bcl(clsize = clsize, ncategories = ncategories, betas = betas,
  xformula = formula(xdata), xdata = parent.frame(),
  cor.matrix = cor.matrix, rlatent = NULL)

Value

Returns a list that has components:

Ysim: the simulated nominal responses. Element ( $i$ , $t$ ) represents the realization of $Y_{i t}$ .
simdata: a data frame that includes the simulated response variables (y), the covariates specified by xformula, subjects' identities (id) and the corresponding measurement occasions (time).
rlatent: the latent random variables denoted by $e_{i t}^{N O}$ in Touloumis (2016).

Arguments

clsize: integer indicating the common cluster size.
ncategories: integer indicating the number of nominal response categories.
betas: numerical vector or matrix containing the value of the marginal regression parameter vector.
xformula: formula expression as in other marginal regression models but without including a response variable.
xdata: optional data frame containing the variables provided in xformula.
cor.matrix: matrix indicating the correlation matrix of the multivariate normal distribution when the NORTA method is employed (rlatent = NULL).
rlatent: matrix with (clsize * ncategories) columns containing realizations of the latent random vectors when the NORTA method is not preferred. See details for more info.

Author

Anestis Touloumis

Details

The formulae are easier to read from either the Vignette or the Reference Manual (both available here).

The assumed marginal baseline category logit model is $l o g \frac{P r (Y_{i t} = j | x_{i t})}{P r (Y_{i t} = J | x_{i t})} = (β_{t j 0} - β_{t J 0}) + (β_{t j}^{^{'}} - β_{t J}^{^{'}}) x_{i t} = β_{t j 0}^{*} + β_{t j}^{*^{'}} x_{i t}$ For subject $i$ , $Y_{i t}$ is the $t$ -th nominal response and $x_{i t}$ is the associated covariates vector. Also $β_{t j 0}$ is the $j$ -th category-specific intercept at the $t$ -th measurement occasion and $β_{t j}$ is the $j$ -th category-specific regression parameter vector at the $t$ -th measurement occasion.

The nominal response $Y_{i t}$ is obtained by extending the principle of maximum random utility (McFadden, 1974) as suggested in Touloumis (2016).

betas should be provided as a numeric vector only when $β_{t j 0} = β_{j 0}$ and $β_{t j} = β_{j}$ for all $t$ . Otherwise, betas must be provided as a numeric matrix with clsize rows such that the $t$ -th row contains the value of ( $β_{t 10}, β_{t 1}, β_{t 20}, β_{t 2}, . . ., β_{t J 0}, β_{t J}$ ). In either case, betas should reflect the order of the terms implied by xformula.

The appropriate use of xformula is xformula = ~ covariates, where covariates indicate the linear predictor as in other marginal regression models.

The optional argument xdata should be provided in ``long'' format.

The NORTA method is the default option for simulating the latent random vectors denoted by $e_{i t j}^{N O}$ in Touloumis (2016). In this case, the algorithm forces cor.matrix to respect the assumption of choice independence. To import simulated values for the latent random vectors without utilizing the NORTA method, the user can employ the rlatent argument. In this case, row $i$ corresponds to subject $i$ and columns $(t - 1) * \code n c a t e g o r i e s + 1, . . ., t * \code n c a t e g o r i e s$ should contain the realization of $e_{i t 1}^{N O}, . . ., e_{i t J}^{N O}$ , respectively, for $t = 1, \dots, \code c l s i z e$ .

References

Cario, M. C. and Nelson, B. L. (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois.

Li, S. T. and Hammond, J. L. (1975) Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Transactions on Systems, Man and Cybernetics 5, 557--561.

McFadden, D. (1974) Conditional logit analysis of qualitative choice behavior. New York: Academic Press, 105--142.

Touloumis, A. (2016) Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package. The R Journal 8, 79--91.

Touloumis, A., Agresti, A. and Kateri, M. (2013) GEE for multinomial responses using a local odds ratios parameterization. Biometrics 69, 633--640.

Examples

Run this code

## See Example 3.1 in the Vignette.
betas <- c(1, 3, 2, 1.25, 3.25, 1.75, 0.75, 2.75, 2.25, 0, 0, 0)
sample_size <- 500
categories_no <- 4
cluster_size <- 3
set.seed(1)
x1 <- rep(rnorm(sample_size), each = cluster_size)
x2 <- rnorm(sample_size * cluster_size)
xdata <- data.frame(x1, x2)
equicorrelation_matrix <- toeplitz(c(1, rep(0.95, cluster_size - 1)))
identity_matrix <- diag(categories_no)
latent_correlation_matrix <- kronecker(equicorrelation_matrix,
  identity_matrix)
simulated_nominal_dataset <- rmult.bcl(clsize = cluster_size,
  ncategories = categories_no, betas = betas, xformula = ~ x1 + x2,
  xdata = xdata, cor.matrix = latent_correlation_matrix)
suppressPackageStartupMessages(library("multgee"))
nominal_gee_model <- nomLORgee(y ~ x1 + x2,
  data = simulated_nominal_dataset$simdata, id = id, repeated = time,
  LORstr = "time.exch")
round(coef(nominal_gee_model), 2)

Run the code above in your browser using DataLab

Last chance! 50% off unlimited learning