error_loop: Error Loop to Correct Final Correlation of Simulated Variables

Description

This function corrects the final correlation of simulated variables to be within a precision value (epsilon) of the target correlation. It updates the pairwise intermediate MVN correlation iteratively in a loop until either the maximum error is less than epsilon or the number of iterations exceeds the maximum number set by the user (maxit). It uses error_vars to simulate all variables and calculate the correlation of all variables in each iteration. This function would not ordinarily be called directly by the user. The function is a modification of Barbiero & Ferrari's ordcont function in GenOrd-package. The ordcont has been modified in the following ways:

1) It works for continuous, ordinal (r >= 2 categories), and count variables.

2) The initial correlation check has been removed because this intermediate correlation Sigma from rcorrvar or rcorrvar2 has already been checked for positive-definiteness and used to generate variables.

3) Eigenvalue decomposition is done on Sigma to impose the correct interemdiate correlations on the normal variables. If Sigma is not positive-definite, the negative eigen values are replaced with 0.

3) The final positive-definite check has been removed.

4) The intermediate correlation update function was changed to accomodate more situations.

5) A final "fail-safe" check was added at the end of the iteration loop where if the absolute error between the final and target pairwise correlation is still > 0.1, the intermediate correlation is set equal to the target correlation (if extra_correct = "TRUE").

6) Allowing specifications for the sample size and the seed for reproducibility.

Usage

error_loop(k_cat, k_cont, k_pois, k_nb, Y_cat, Y, Yb, Y_pois, Y_nb, marginal,
  support, method, means, vars, constants, lam, size, prob, mu, n, seed,
  epsilon, maxit, rho0, Sigma, rho_calc, extra_correct)

Arguments

k_cat

the number of ordinal (r >= 2 categories) variables

k_cont

the number of continuous variables

k_pois

the number of Poisson variables

k_nb

the number of Negative Binomial variables

Y_cat

the ordinal variables generated from rcorrvar or rcorrvar2

the continuous (mean 0, variance 1) variables

the continuous variables with desired mean and variance

Y_pois

the Poisson variables

Y_nb

the Negative Binomial variables

marginal

a list of length equal k_cat; the i-th element is a vector of the cumulative probabilities defining the marginal distribution of the i-th variable; if the variable can take r values, the vector will contain r - 1 probabilities (the r-th is assumed to be 1)

support

a list of length equal k_cat; the i-th element is a vector of containing the r ordered support values; if not provided, the default is for the i-th element to be the vector 1, ..., r

method

the method used to generate the continuous variables. "Fleishman" uses a third-order polynomial transformation and "Polynomial" uses Headrick's fifth-order transformation.

means

a vector of means for the continuous variables

vars

a vector of variances

constants

a matrix with k_cont rows, each a vector of constants c0, c1, c2, c3 (if method = "Fleishman") or c0, c1, c2, c3, c4, c5 (if method = "Polynomial"), like that returned by find_constants

lam

a vector of lambda (> 0) constants for the Poisson variables (see dpois)

size

a vector of size parameters for the Negative Binomial variables (see dnbinom)

prob

a vector of success probability parameters

a vector of mean parameters (*Note: either prob or mu should be supplied for all Negative Binomial variables, not a mixture)

the sample size

seed

the seed value for random number generation

epsilon

the maximum acceptable error between the final and target correlation matrices; smaller epsilons take more time

maxit

the maximum number of iterations to use to find the intermediate correlation; the correction loop stops when either the iteration number passes maxit or epsilon is reached

rho0

the target correlation matrix

Sigma

the intermediate correlation matrix previously used in rcorrvar or rcorrvar2

rho_calc

the final correlation matrix calculated in rcorrvar or rcorrvar2

extra_correct

if "TRUE", a final "fail-safe" check is used at the end of the iteration loop where if the absolute error between the final and target pairwise correlation is still > 0.1, the intermediate correlation is set equal to the target correlation

Value

A list with the following components:

Sigma the intermediate MVN correlation matrix resulting from the error loop

rho_calc the calculated final correlation matrix generated from Sigma

Y_cat the ordinal variables

Y the continuous (mean 0, variance 1) variables

Yb the continuous variables with desired mean and variance

Y_pois the Poisson variables

Y_nb the Negative Binomial variables

niter a matrix containing the number of iterations required for each variable pair

References

Barbiero A, Ferrari PA (2015). GenOrd: Simulation of Discrete Random Variables with Given Correlation Matrix and Marginal Distributions. R package version 1.4.0. https://CRAN.R-project.org/package=GenOrd

Ferrari PA, Barbiero A (2012). Simulating ordinal data. Multivariate Behavioral Research, 47(4): 566-589. 10.1080/00273171.2012.692630.

Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43, 521-532. 10.1007/BF02293811.

Headrick TC (2002). Fast Fifth-order Polynomial Transforms for Generating Univariate and Multivariate Non-normal Distributions. Computational Statistics & Data Analysis, 40(4):685-711. 10.1016/S0167-9473(02)00072-5. (ScienceDirect)

Headrick TC (2004). On Polynomial Transformations for Simulating Multivariate Nonnormal Distributions. Journal of Modern Applied Statistical Methods, 3(1), 65-71. 10.22237/jmasm/1083370080.

Headrick TC, Kowalchuk RK (2007). The Power Method Transformation: Its Probability Density Function, Distribution Function, and Its Further Use for Fitting Data. Journal of Statistical Computation and Simulation, 77, 229-249. 10.1080/10629360600605065.

Headrick TC, Sawilowsky SS (1999). Simulating Correlated Non-normal Distributions: Extending the Fleishman Power Method. Psychometrika, 64, 25-35. 10.1007/BF02294317.

Headrick TC, Sheng Y, & Hodis FA (2007). Numerical Computing and Graphics for the Power Method Transformation Using Mathematica. Journal of Statistical Software, 19(3), 1 - 17. 10.18637/jss.v019.i03.

Higham N (2002). Computing the nearest correlation matrix - a problem from finance; IMA Journal of Numerical Analysis 22: 329-343.

Description

Usage

Arguments

Value

References

See Also