This function creates a generative model for the correlated, mixed-scale predictors.
MixtureModel(
method = "estimation",
data = NULL,
G = NULL,
m = 100,
nudge = 1e-09,
sbg_args = list(nsamp = 1000),
cvine_marginals = list(),
cvine_dtypes = list(),
resamp_prob = NULL
)
A MixtureModel object.
A string, one of the three options "resampling", "estimation", or "cvine". Default is "estimation". See Details.
A dataframe or matrix, required for resampling" and "estimation" method.
A guesstimate pairwise correlation matrix for "cvine" method. See Details.
A positive number indicating uncertainty in the guesstimate G, larger means more uncertainty. Default is 100.
A number, default 10e-10 to add to the diagonal of the covariance matrix for numerical stability.
A list of named arguments, except Y, for function `sbgcop.mcmc()`.
A named list describing the univariate distribution of each predictor. See Details.
A named list describing the data type of each variable.
A vector of sampling probability for each observation in data. Must sums to 1.
There are three methods to generate data:
1. Resampling: if we have enough data of the predictors, we can resample to get realistic joint distributions and dependence among them.
2. Estimation: if we have a small sample from, for example, a pilot study, we can sample from a semi-parametric copula model (Hoff 2007) after learning the dependence and univariate marginals of the predictors.
3. C-vine: if no pilot data exists, we can still set rough guesstimate of the dependence and univariate marginals. The C-vine algorithm (Joe 2006) generates positive semi-definite correlation matrix given the guesstimate G. The guesstimate G is a symmetric p x p matrix whose ij-th item is between -1 and 1 and is the guesstimate correlation between predictor ith and jth. G doesn't need to be a valid correlation matrix. The method works well when values in G are not extreme (i.e., 0.999, -0.999). Built-in functions for univariate marginals include: `qbeta` , `qbinom`, `qcauchy`, `qchisq`, `qexp`, `qf`, `qgamma`, `qgeom`, `qhyper`, `qlogis`, `qlnorm`, `qmultinom`, `qnbinom`, `qnorm`, `qpois`, `qt`, `qunif`, `qweibull`.
Hoff P (2007). 'Extending the rank likelihood for semiparametric copula estimation.' Ann. Appl. Stat, 1(1), 265-283.
Joe H (2006). “Generating random correlation matrices based on partial correlations.”Journal of Multivariate Analysis, 97, 2177-2189.
data("nhanes1518")
xmod <- mpower::MixtureModel(nhanes1518, method = "resampling")
Run the code above in your browser using DataLab