MixtureModel: Correlated predictors generator

Description

This function creates a generative model for the correlated, mixed-scale predictors.

Usage

MixtureModel(
  method = "estimation",
  data = NULL,
  G = NULL,
  m = 100,
  nudge = 1e-09,
  sbg_args = list(nsamp = 1000),
  cvine_marginals = list(),
  cvine_dtypes = list(),
  resamp_prob = NULL
)

Value

A MixtureModel object.

Arguments

method: A string, one of the three options "resampling", "estimation", or "cvine". Default is "estimation". See Details.
data: A dataframe or matrix, required for resampling" and "estimation" method.
G: A guesstimate pairwise correlation matrix for "cvine" method. See Details.
m: A positive number indicating uncertainty in the guesstimate G, larger means more uncertainty. Default is 100.
nudge: A number, default 10e-10 to add to the diagonal of the covariance matrix for numerical stability.
sbg_args: A list of named arguments, except Y, for function `sbgcop.mcmc()`.
cvine_marginals: A named list describing the univariate distribution of each predictor. See Details.
cvine_dtypes: A named list describing the data type of each variable.
resamp_prob: A vector of sampling probability for each observation in data. Must sums to 1.

Details

There are three methods to generate data:

1. Resampling: if we have enough data of the predictors, we can resample to get realistic joint distributions and dependence among them.

2. Estimation: if we have a small sample from, for example, a pilot study, we can sample from a semi-parametric copula model (Hoff 2007) after learning the dependence and univariate marginals of the predictors.

3. C-vine: if no pilot data exists, we can still set rough guesstimate of the dependence and univariate marginals. The C-vine algorithm (Joe 2006) generates positive semi-definite correlation matrix given the guesstimate G. The guesstimate G is a symmetric p x p matrix whose ij-th item is between -1 and 1 and is the guesstimate correlation between predictor ith and jth. G doesn't need to be a valid correlation matrix. The method works well when values in G are not extreme (i.e., 0.999, -0.999). Built-in functions for univariate marginals include: `qbeta` , `qbinom`, `qcauchy`, `qchisq`, `qexp`, `qf`, `qgamma`, `qgeom`, `qhyper`, `qlogis`, `qlnorm`, `qmultinom`, `qnbinom`, `qnorm`, `qpois`, `qt`, `qunif`, `qweibull`.

References

Hoff P (2007). 'Extending the rank likelihood for semiparametric copula estimation.' Ann. Appl. Stat, 1(1), 265-283.

Joe H (2006). “Generating random correlation matrices based on partial correlations.”Journal of Multivariate Analysis, 97, 2177-2189.

Examples

Run this code

data("nhanes1518")
xmod <- mpower::MixtureModel(nhanes1518, method = "resampling")

Run the code above in your browser using DataLab