summary_var: Summary of Simulated Variables

Description

This function summarizes the results of contmixvar1, corrvar, or corrvar2. The inputs are either the simulated variables or inputs for those functions. See their documentation for more information. If summarizing result from contmixvar1, mixture parameters may be entered as vectors instead of lists.

Usage

summary_var(Y_cat = NULL, Y_cont = NULL, Y_comp = NULL, Y_mix = NULL,
  Y_pois = NULL, Y_nb = NULL, means = NULL, vars = NULL, skews = NULL,
  skurts = NULL, fifths = NULL, sixths = NULL, mix_pis = list(),
  mix_mus = list(), mix_sigmas = list(), mix_skews = list(),
  mix_skurts = list(), mix_fifths = list(), mix_sixths = list(),
  marginal = list(), lam = NULL, p_zip = 0, size = NULL, prob = NULL,
  mu = NULL, p_zinb = 0, rho = NULL)

Arguments

Y_cat

a matrix of ordinal variables

Y_cont

a matrix of continuous non-mixture variables

Y_comp

a matrix of components of continuous mixture variables

Y_mix

a matrix of continuous mixture variables

Y_pois

a matrix of Poisson variables

Y_nb

a matrix of Negative Binomial variables

means

a vector of means for the k_cont non-mixture and k_mix mixture continuous variables (i.e. rep(0, (k_cont + k_mix)))

vars

a vector of variances for the k_cont non-mixture and k_mix mixture continuous variables (i.e. rep(1, (k_cont + k_mix)))

skews

a vector of skewness values for the k_cont non-mixture continuous variables

skurts

a vector of standardized kurtoses (kurtosis - 3, so that normal variables have a value of 0) for the k_cont non-mixture continuous variables

fifths

a vector of standardized fifth cumulants for the k_cont non-mixture continuous variables (not necessary for method = "Fleishman")

sixths

a vector of standardized sixth cumulants for the k_cont non-mixture continuous variables (not necessary for method = "Fleishman")

mix_pis

a list of length k_mix with i-th component a vector of mixing probabilities that sum to 1 for component distributions of \(Y_{mix_i}\)

mix_mus

a list of length k_mix with i-th component a vector of means for component distributions of \(Y_{mix_i}\)

mix_sigmas

a list of length k_mix with i-th component a vector of standard deviations for component distributions of \(Y_{mix_i}\)

mix_skews

a list of length k_mix with i-th component a vector of skew values for component distributions of \(Y_{mix_i}\)

mix_skurts

a list of length k_mix with i-th component a vector of standardized kurtoses for component distributions of \(Y_{mix_i}\)

mix_fifths

a list of length k_mix with i-th component a vector of standardized fifth cumulants for component distributions of \(Y_{mix_i}\) (not necessary for method = "Fleishman")

mix_sixths

a list of length k_mix with i-th component a vector of standardized sixth cumulants for component distributions of \(Y_{mix_i}\) (not necessary for method = "Fleishman")

marginal

a list of length equal to k_cat; the i-th element is a vector of the cumulative probabilities defining the marginal distribution of the i-th variable; if the variable can take r values, the vector will contain r - 1 probabilities (the r-th is assumed to be 1); for binary variables, these should be input the same as for ordinal variables with more than 2 categories (i.e. the user-specified probability is the probability of the 1st category, which has the smaller support value)

lam

a vector of lambda (mean > 0) constants for the Poisson variables (see stats::dpois); the order should be 1st regular Poisson variables, 2nd zero-inflated Poisson variables

p_zip

a vector of probabilities of structural zeros (not including zeros from the Poisson distribution) for the zero-inflated Poisson variables (see VGAM::dzipois); if p_zip = 0, \(Y_{pois}\) has a regular Poisson distribution; if p_zip is in (0, 1), \(Y_{pois}\) has a zero-inflated Poisson distribution; if p_zip is in (-(exp(lam) - 1)^(-1), 0), \(Y_{pois}\) has a zero-deflated Poisson distribution and p_zip is not a probability; if p_zip = -(exp(lam) - 1)^(-1), \(Y_{pois}\) has a positive-Poisson distribution (see VGAM::dpospois); if length(p_zip) < length(lam), the missing values are set to 0 (and ordered 1st)

size

a vector of size parameters for the Negative Binomial variables (see stats::dnbinom); the order should be 1st regular NB variables, 2nd zero-inflated NB variables

prob

a vector of success probability parameters for the NB variables; order the same as in size

a vector of mean parameters for the NB variables (*Note: either prob or mu should be supplied for all Negative Binomial variables, not a mixture; default = NULL); order the same as in size; for zero-inflated NB this refers to the mean of the NB distribution (see VGAM::dzinegbin)

p_zinb

a vector of probabilities of structural zeros (not including zeros from the NB distribution) for the zero-inflated NB variables (see VGAM::dzinegbin); if p_zinb = 0, \(Y_{nb}\) has a regular NB distribution; if p_zinb is in (-prob^size/(1 - prob^size), 0), \(Y_{nb}\) has a zero-deflated NB distribution and p_zinb is not a probability; if p_zinb = -prob^size/(1 - prob^size), \(Y_{nb}\) has a positive-NB distribution (see VGAM::dposnegbin); if length(p_zinb) < length(size), the missing values are set to 0 (and ordered 1st)

rho

the target correlation matrix which must be ordered 1st ordinal, 2nd continuous non-mixture, 3rd components of continuous mixtures, 4th regular Poisson, 5th zero-inflated Poisson, 6th regular NB, 7th zero-inflated NB; note that rho is specified in terms of the components of \(Y_{mix}\)

Value

A list whose components vary based on the type of simulated variables.

If ordinal variables are produced:

ord_sum a list, where the i-th element contains a data.frame with target and simulated cumulative probabilities for ordinal variable Y_i

If continuous variables are produced:

cont_sum a data.frame summarizing Y_cont and Y_comp,

target_sum a data.frame with the target distributions for Y_cont and Y_comp,

mix_sum a data.frame summarizing Y_mix,

target_mix a data.frame with the target distributions for Y_mix,

If Poisson variables are produced:

pois_sum a data.frame summarizing Y_pois

If Negative Binomial variables are produced:

nb_sum a data.frame summarizing Y_nb

Additionally, the following elements:

rho_calc the final correlation matrix for Y_cat, Y_cont, Y_comp, Y_pois, and Y_nb

rho_mix the final correlation matrix for Y_cat, Y_cont, Y_mix, Y_pois, and Y_nb

maxerr the maximum final correlation error of rho_calc from the target rho.

References

See references for SimCorrMix.

Examples

Run this code

# NOT RUN {
# Using normal mixture variable from contmixvar1 example
Nmix <- contmixvar1(n = 1000, "Polynomial", means = 0, vars = 1,
  mix_pis = c(0.4, 0.6), mix_mus = c(-2, 2), mix_sigmas = c(1, 1),
  mix_skews = c(0, 0), mix_skurts = c(0, 0), mix_fifths = c(0, 0),
  mix_sixths = c(0, 0))
Nsum <- summary_var(Y_comp = Nmix$Y_comp, Y_mix = Nmix$Y_mix,
  means = 0, vars = 1, mix_pis = c(0.4, 0.6), mix_mus = c(-2, 2),
  mix_sigmas = c(1, 1), mix_skews = c(0, 0), mix_skurts = c(0, 0),
  mix_fifths = c(0, 0), mix_sixths = c(0, 0))

# }
# NOT RUN {
# 2 continuous mixture, 1 binary, 1 zero-inflated Poisson, and
# 1 zero-inflated NB variable
n <- 10000
seed <- 1234

# Mixture variables: Normal mixture with 2 components;
# mixture of Logistic(0, 1), Chisq(4), Beta(4, 1.5)
# Find cumulants of components of 2nd mixture variable
L <- calc_theory("Logistic", c(0, 1))
C <- calc_theory("Chisq", 4)
B <- calc_theory("Beta", c(4, 1.5))

skews <- skurts <- fifths <- sixths <- NULL
Six <- list()
mix_pis <- list(c(0.4, 0.6), c(0.3, 0.2, 0.5))
mix_mus <- list(c(-2, 2), c(L[1], C[1], B[1]))
mix_sigmas <- list(c(1, 1), c(L[2], C[2], B[2]))
mix_skews <- list(rep(0, 2), c(L[3], C[3], B[3]))
mix_skurts <- list(rep(0, 2), c(L[4], C[4], B[4]))
mix_fifths <- list(rep(0, 2), c(L[5], C[5], B[5]))
mix_sixths <- list(rep(0, 2), c(L[6], C[6], B[6]))
mix_Six <- list(list(NULL, NULL), list(1.75, NULL, 0.03))
Nstcum <- calc_mixmoments(mix_pis[[1]], mix_mus[[1]], mix_sigmas[[1]],
  mix_skews[[1]], mix_skurts[[1]], mix_fifths[[1]], mix_sixths[[1]])
Mstcum <- calc_mixmoments(mix_pis[[2]], mix_mus[[2]], mix_sigmas[[2]],
  mix_skews[[2]], mix_skurts[[2]], mix_fifths[[2]], mix_sixths[[2]])
means <- c(Nstcum[1], Mstcum[1])
vars <- c(Nstcum[2]^2, Mstcum[2]^2)

marginal <- list(0.3)
support <- list(c(0, 1))
lam <- 0.5
p_zip <- 0.1
size <- 2
prob <- 0.75
p_zinb <- 0.2

k_cat <- k_pois <- k_nb <- 1
k_cont <- 0
k_mix <- 2
Rey <- matrix(0.39, 8, 8)
diag(Rey) <- 1
rownames(Rey) <- colnames(Rey) <- c("O1", "M1_1", "M1_2", "M2_1", "M2_2",
  "M2_3", "P1", "NB1")

# set correlation between components of the same mixture variable to 0
Rey["M1_1", "M1_2"] <- Rey["M1_2", "M1_1"] <- 0
Rey["M2_1", "M2_2"] <- Rey["M2_2", "M2_1"] <- Rey["M2_1", "M2_3"] <- 0
Rey["M2_3", "M2_1"] <- Rey["M2_2", "M2_3"] <- Rey["M2_3", "M2_2"] <- 0

# check parameter inputs
validpar(k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means,
  vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas,
  mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support,
  lam, p_zip, size, prob, mu = NULL, p_zinb, rho = Rey)

# check to make sure Rey is within the feasible correlation boundaries
validcorr(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means,
  vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas,
  mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal,
  lam, p_zip, size, prob, mu = NULL, p_zinb, Rey, seed)

# simulate without the error loop
Sim1 <- corrvar(n, k_cat, k_cont, k_mix, k_pois, k_nb, "Polynomial", means,
  vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas,
  mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support,
  lam, p_zip, size, prob, mu = NULL, p_zinb, Rey, seed, epsilon = 0.01)

Summ1 <- summary_var(Sim1$Y_cat, Y_cont = NULL, Sim1$Y_comp, Sim1$Y_mix,
  Sim1$Y_pois, Sim1$Y_nb, means, vars, skews, skurts, fifths, sixths,
  mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths,
  mix_sixths, marginal, lam, p_zip, size, prob, mu = NULL, p_zinb, Rey)

Sim1_error <- abs(Rey - Summ1$rho_calc)
summary(as.numeric(Sim1_error))

# }
# NOT RUN {

# }

Run the code above in your browser using DataLab