Learn R Programming

PSinference (version 0.2.2)

Sphdist: Spherical Empirical Distribution

Description

This function calculates the empirical distribution of the pivotal random variable that can be used to perform the Sphericity test of the population covariance matrix \(\boldsymbol{\Sigma}\) that is \(\boldsymbol{\Sigma} = \sigma^2 \mathbf{I}_p\), based on the released Single Synthetic data generated under Plug-in Sampling, assuming that the original dataset is normally distributed.

Usage

Sphdist(nsample, pvariates, iterations)

Value

a vector of length iterations that recorded the empirical distribution's values.

Arguments

nsample

Sample size.

pvariates

Number of variables.

iterations

Number of iterations for simulating values from the distribution and finding the quantiles. Default is 10000.

Details

We define $$T_2^\star = \frac{|\boldsymbol{S}^{\star}|^{\frac{1}{p}}}{tr(\boldsymbol{S}^{\star})/p}$$ where \(\boldsymbol{S}^\star = \sum_{i=1}^n (v_i - \bar{v})(v_i - \bar{v})^{\top}\), \(v_i\) is the \(i\)th observation of the synthetic dataset. For \(\boldsymbol{\Sigma} = \sigma^2 \mathbf{I}_p\), its distribution is stochastic equivalent to $$\frac{|\boldsymbol{\Omega}_{1}\boldsymbol{\Omega}_{2}|^{\frac{1}{p}}}{tr(\boldsymbol{\Omega}_{1}\boldsymbol{\Omega}_{2})/p}$$ where \(\boldsymbol{\Omega}_1\) and \(\boldsymbol{\Omega}_2\) are Wishart random variables, \(\boldsymbol{\Omega}_1 \sim \mathcal{W}_p(n-1, \frac{\mathbf{I}_p}{n-1})\) is independent of \(\boldsymbol{\Omega}_2 \sim \mathcal{W}_p(n-1, \mathbf{I}_p)\). To test \(\mathcal{H}_0: \boldsymbol{\Sigma} = \sigma^2 \mathbf{I}_p\), compute the observed value of \(T_{2}^\star\), \(\widetilde{T_{2}^\star}\), with the observed values and reject the null hypothesis if \(\widetilde{T_{2}^\star}>t^\star_{2,\alpha}\) for \(\alpha\)-significance level, where \(t^\star_{2,\gamma}\) is the \(\gamma\)th percentile of \(T_2^\star\).

References

Klein, M., Moura, R. and Sinha, B. (2021). Multivariate Normal Inference based on Singly Imputed Synthetic Data under Plug-in Sampling. Sankhya B 83, 273–287.

Examples

Run this code
# Original data created
library(MASS)
mu <- c(1,2,3,4)
Sigma <- matrix(c(1, 0, 0, 0,
                  0, 1, 0, 0,
                  0, 0, 1, 0,
                  0, 0, 0, 1), nrow = 4, ncol = 4, byrow = TRUE)
seed = 1
n_sample = 100
# Create original simulated dataset
df = mvrnorm(n_sample, mu = mu, Sigma = Sigma)

# Synthetic data created

df_s = simSynthData(df)


# Gather the 0.95 quantile

p = dim(df_s)[2]

T_sph <- Sphdist(nsample = n_sample, pvariates = p, iterations = 10000)
q95 <- quantile(T_sph, 0.95)

# Compute the observed value of T from the synthetic dataset
S_star = cov(df_s*(n_sample-1))

T_obs = (det(S_star)^(1/p))/(sum(diag(S_star))/p)

print(q95)
print(T_obs)

#Since the observed value is bigger than the 95% quantile,
#we don't have statistical evidences to reject the Sphericity property.
#
#Note that the value is very close to one

Run the code above in your browser using DataLab