Sphdist: Spherical Empirical Distribution

Description

This function calculates the empirical distribution of the pivotal random variable that can be used to perform the Sphericity test of the population covariance matrix $\boldsymbol{\Sigma}$ that is $\boldsymbol{\Sigma} = \sigma^2 \mathbf{I}_p$, based on the released Single Synthetic data generated under Plug-in Sampling, assuming that the original dataset is normally distributed.

Usage

Sphdist(nsample, pvariates, iterations)

Value

a vector of length iterations that recorded the empirical distribution's values.

Arguments

nsample: Sample size.
pvariates: Number of variables.
iterations: Number of iterations for simulating values from the distribution and finding the quantiles. Default is 10000.

Details

We define $$T_2^\star = \frac{|\boldsymbol{S}^{\star}|^{\frac{1}{p}}}{tr(\boldsymbol{S}^{\star})/p}$$ where $\boldsymbol{S}^\star = \sum_{i=1}^n (v_i - \bar{v})(v_i - \bar{v})^{\top}$, $v_i$ is the $i$th observation of the synthetic dataset. For $\boldsymbol{\Sigma} = \sigma^2 \mathbf{I}_p$, its distribution is stochastic equivalent to $$\frac{|\boldsymbol{\Omega}_{1}\boldsymbol{\Omega}_{2}|^{\frac{1}{p}}}{tr(\boldsymbol{\Omega}_{1}\boldsymbol{\Omega}_{2})/p}$$ where $\boldsymbol{\Omega}_1$ and $\boldsymbol{\Omega}_2$ are Wishart random variables, $\boldsymbol{\Omega}_1 \sim \mathcal{W}_p(n-1, \frac{\mathbf{I}_p}{n-1})$ is independent of $\boldsymbol{\Omega}_2 \sim \mathcal{W}_p(n-1, \mathbf{I}_p)$. To test $\mathcal{H}_0: \boldsymbol{\Sigma} = \sigma^2 \mathbf{I}_p$, compute the observed value of $T_{2}^\star$, $\widetilde{T_{2}^\star}$, with the observed values and reject the null hypothesis if $\widetilde{T_{2}^\star}>t^\star_{2,\alpha}$ for $\alpha$-significance level, where $t^\star_{2,\gamma}$ is the $\gamma$th percentile of $T_2^\star$.

References

Klein, M., Moura, R. and Sinha, B. (2021). Multivariate Normal Inference based on Singly Imputed Synthetic Data under Plug-in Sampling. Sankhya B 83, 273–287.

Examples

Run this code

# Original data created
library(MASS)
mu <- c(1,2,3,4)
Sigma <- matrix(c(1, 0, 0, 0,
                  0, 1, 0, 0,
                  0, 0, 1, 0,
                  0, 0, 0, 1), nrow = 4, ncol = 4, byrow = TRUE)
seed = 1
n_sample = 100
# Create original simulated dataset
df = mvrnorm(n_sample, mu = mu, Sigma = Sigma)

# Synthetic data created

df_s = simSynthData(df)


# Gather the 0.95 quantile

p = dim(df_s)[2]

T_sph <- Sphdist(nsample = n_sample, pvariates = p, iterations = 10000)
q95 <- quantile(T_sph, 0.95)

# Compute the observed value of T from the synthetic dataset
S_star = cov(df_s*(n_sample-1))

T_obs = (det(S_star)^(1/p))/(sum(diag(S_star))/p)

print(q95)
print(T_obs)

#Since the observed value is bigger than the 95% quantile,
#we don't have statistical evidences to reject the Sphericity property.
#
#Note that the value is very close to one

Run the code above in your browser using DataLab