TestMCARNormality
Testing Homoscedasticity, Multivariate Normality, and Missing Completely at Random
The main purpose of this package is to test whether the missing data mechanism, for an incompletely observed data set, is one of missing completely at random (MCAR). As a by product, however, this package has the capabilities of imputing incomplete data, performing a test to determine whether data have a multivariate normal distribution, performing a test of equality of covariances for groups, and obtaining normaltheory maximum likelihood estimates for mean and covariance when data are incomplete. The test of MCAR follows the methodology proposed by Jamshidian and Jalal (2010). It is based on testing equality of covariances between groups having identical missing data patterns. The data are imputed, using two options of normality and distribution free, and the test of equality of covariances between groups with identical missing data patterns is performed also with options of assuming normality (Hawkins test) or nonparametrically. Users can optionally use their own method of data imputation as well. Multiple imputation is an additional feature of the program that can be used as a diagnostic tool to help identify cases or variables that contribute to rejection of MCAR, when the MCAR test is rejecetd (See Jamshidian and Jalal, 2010 for details). As explained in Jamshidian, Jalal, and Jansen (2014), this package can also be used for imputing missing data, test of multivariate normality, and test of equality of covariances between several groups when data are completly observed.
Usage
TestMCARNormality(data, del.lesscases = 6, imputation.number = 1, method = "Auto", imputation.method = "Dist.Free", nrep = 10000, n.min = 30, seed = 110, alpha = 0.05, imputed.data = NA)
Arguments
 data
 A matrix or data frame consisting of at least two columns. Values must be numerical with missing data indicated by NA.
 del.lesscases
 Missing data patterns consisting of del.lesscases number of cases or less will be removed from the data set.
 imputation.number
 Number of imputations to be used, if data are to be multiply imputed.
 method
 method is an option that allows the user to select one of the methods of Hawkins or nonparametric for the test. If the user is certain that data have multivariate normal distribution, the method="Hawkins" should be selected. On the other hand if data are not normally distributed, then method="Nonparametric" should be used. If the user is unsure, then the default value of method="Auto" will be used, in which case both the Hawkins and the nonparametric tests will be run, and the default output follows the recommendation by Jamshidian and Jalal (2010) outlined in their flowchart given in Figure 7 of their paper.
 imputation.method

"Dist.Free": Missing data are imputed nonparametrically using the method of Sirvastava and Dolatabadi (2009);
also see Jamshidian and Jalal (2010).
"normal": Missing data are imputed assuming that the data come from a multivariate normal distribution. The maximum likelihood estimate of the mean and covariance obtained from Mls is used for generating imputed values. The imputed values are based on the conditional distribution of the missing variables given the observed variables; see Jamshidian and Jalal (2010) for more details.
 nrep
 Number of replications used to simulate the Neyman distribution to determine the cut off value for the Neyman test in the program SimNey. Larger values increase the accuracy of the Neyman test.
 n.min
 The minimum number of cases in a group that triggers the use of asymptotic Chi distribution in place of the emprical distribution in the Neyman test of uniformity.
 seed
 An initial random number generator seed. The default is 110 that can be reset to a user selected number. If the value is set to NA, a system selected seed is used.
 alpha
 The significance level at which tests are performed.
 imputed.data
 The user can optionally provide an imputed data set. In this case the program will not impute the data and will use the imputed data set for the tests performed. Note that the order of cases in the imputed data set should be the same as that of the incomplete data set.
Details
Theoretical, technical and prcatical details about this program and its uses can be found in Jamshidian and Jalal (2010) and Jamshidian, Jalal, and Jansen (2014).
Value
 analyzed.data
 The data that were used in the analysis. If del.lesscases=0, this is the same as the orginal data inputted. If del.lesscases > 0, then this is the data with cases removed.
 imputed.data
 The analyzed.data after imputation. If imputation.number > 1, the first imputed data set is returned.
 ordered.data
 The analyzed.data ordered according to missing data pattern, usin the function OrderMissing.
 caseorder
 A mapping of case number indices from ordered.data to the original data. More specifically, the jth row of the ordered.data is the caseorder[j]th (the jth element of caseorder) row of the original data.
 pnormality
 pvalue for the nonparametric test: When imputation.number > 1, this is a vector with each element corresponding to each of the imputed data sets.
 adistar
 A matrix consisting of the AndersonDarling test statistic for each group (columns) and each imputation (rows).
 adstar
 Sum of adistar: When imputation.number >1, this is a vector with each element corresponding to each of the imputed data sets.
 pvalcomb
 pvalue for the Hawkins test: When imputation.number >1, this is a vector with each element corresponding to each of the imputed data sets.
 pvalsn
 A matrix consisting of Hawkins test statistics for each group (columns) and each imputation (rows).
 g
 Number of patterns used in the analysis.
 combp
 Hawkins test statistic: When imputation.number > 1, this is a vector with each element corresponding to each of the imputed data sets.
 alpha
 The significance level at which the hypothesis tests are performed.
 patcnt
 A vector consisting the number of cases corresponding to each pattern in patused.
 patused
 A matrix indicating the missing data patterns in the data set, using 1 and NA's.
 imputation.number
 A value greater than or equal to 1. If a value larger than 1 is used, data will be imputed imputation.number times.
 mu
 The normaltheory maximum likelihood estimate of the variables means.
 sigma
 The normaltheory maximum likelihood estimate of the variables covariance matrix.
Note
Note 1: In the above descriptions "original data" refers to the input data after deletion of the rows consisting of all NA's (if any)
Note 2: The normal theory maximum likelihood estimate of mean and covariance is obtained using the EM algorithm, as described in Jamshidian and Bentler (1999). The standard errors for these estimates, based on the observed information matrix, can be obtained via the function Ddf, included in this package.
References
Jamshidian, M. and Bentler, P. M. (1999). ``ML estimation of mean and covariance structures with missing data using complete data routines.'' Journal of Educational and Behavioral Statistics, 24, 2141.
Jamshidian, M. and Jalal, S. (2010). ``Tests of homoscedasticity, normality, and missing at random for incomplete multivariate data,'' Psychometrika, 75, 649674.
Jamshidian, M. Jalal, S., and Jansen, C. (2014). `` MissMech: An R Package for Testing Homoscedasticity, Multivariate Normality, and Missing Completely at Random (MCAR),'' Journal of Statistical Software, 56(6), 131.
Examples
# Example 1: Data are MCAR and normally distributed
n < 300
p < 5
pctmiss < 0.2
set.seed(1010)
y < matrix(rnorm(n * p),nrow = n)
missing < matrix(runif(n * p), nrow = n) < pctmiss
y[missing] < NA
out < TestMCARNormality(data=y)
print(out)
#  Prints the pvalue for both the Hawkins and the nonparametric test
summary(out)
#  Uses more cases
#out1 < TestMCARNormality(data=y, del.lesscases = 1)
#print(out1)
# performs multiple imputation
Out < TestMCARNormality (data = y, imputation.number = 10)
summary(Out)
boxplot(Out)
# Example 2: Data are MCAR and nonnormally distributed (t distributed with d.f. = 5)
n < 300
p < 5
pctmiss < 0.2
set.seed(1010)
y < matrix(rt(n * p, 5), nrow = n)
missing < matrix(runif(n * p), nrow = n) < pctmiss
y[missing] < NA
out < TestMCARNormality(data=y)
print(out)
# Perform multiple imputation
#Out_m < TestMCARNormality (data = y, imputation.number = 20)
#boxplot(Out_m)
# One may impute the data using a method other than the methods available in the package
# MissMech. If object "yimputed" set to be imputed data using other methods, e.g. k nearest
# neighbor imputation, then in MissMech it can be implemented as follow
#out_k < TestMCARNormality(data = y, imputed.data = yimputed)
#print(out_k)
# Example 3: Data are MAR (not MCAR), but are normally distributed
n < 300
p < 5
r < 0.3
mu < rep(0, p)
sigma < r * (matrix(1, p, p)  diag(1, p))+ diag(1, p)
set.seed(110)
eig < eigen(sigma)
sig.sqrt < eig$vectors %*% diag(sqrt(eig$values)) %*% solve(eig$vectors)
sig.sqrt < (sig.sqrt + sig.sqrt) / 2
y < matrix(rnorm(n * p), nrow = n) %*% sig.sqrt
tmp < y
for (j in 2:p){
y[tmp[, j  1] > 0.8, j] < NA
}
out < TestMCARNormality(data = y, alpha =0.1)
print(out)
# Example 4: Multiple imputation; data are MAR (not MCAR), but are normally distributed
#n < 300
#p < 5
#pctmiss < 0.2
#set.seed(1010)
#y < matrix (rnorm(n * p), nrow = n)
#missing < matrix(runif(n * p), nrow = n) < pctmiss
#y[missing] < NA
#Out < OrderMissing(y)
#y < Out$data
#spatcnt < Out$spatcnt
#g2 < seq(spatcnt[1] + 1, spatcnt[2])
#g4 < seq(spatcnt[3] + 1, spatcnt[4])
#y[c(g2, g4), ] < 2 * y[c(g2, g4), ]
#out < TestMCARNormality(data = y, imputation.number = 20)
#print(out)
#boxplot(out)
# Removing Groups 2 and 4
#y1= y[seq(spatcnt[1]+1,spatcnt[2]),]
#out < TestMCARNormality(data=y1,imputation.number = 20)
#print(out)
#boxplot(out)
# Example 5: Test of homoscedasticity for complete data
#n < 50
#p < 5
#r < 0.4
#sigma < r * (matrix(1, p, p)  diag(1, p)) + diag(1, p)
#set.seed(1010)
#eig < eigen(sigma)
#sig.sqrt < eig$vectors %*% diag(sqrt(eig$values)) %*% solve(eig$vectors)
#sig.sqrt < (sig.sqrt + sig.sqrt) / 2
#y1 < matrix(rnorm(n * p), nrow = n) %*% sig.sqrt
#n < 75
#p < 5
#y2 < matrix(rnorm(n * p), nrow = n)
#n < 25
#p < 5
#r < 0
#sigma < r * (matrix(1, p, p)  diag(1, p)) + diag(2, p)
#y3 < matrix(rnorm(n * p), nrow = n) %*% sqrt(sigma)
#ycomplete < rbind(y1 ,y2 ,y3)
#y1 [ ,1] < NA
#y2[,c(1 ,3)] < NA
#y3 [ ,2] < NA
#ygroup < rbind(y1, y2, y3)
#out < TestMCARNormality(data = ygroup, method = "Hawkins", imputed.data = ycomplete)
#print(out)
#  Example 6, real data
#data(agingdata)
#TestMCARNormality(agingdata, del.lesscases = 1)