Iterative Sequential Regression
isr performs imputation of missing values based on an optionally
specified model. Missingness is assumed to be missing at random (MAR).
isr(X, M, Xinit, mi = 1, burnIn = 100, thinning = 20, intercept = T)
- A matrix of points to be imputed or used for covariates by isr.
NAvalues are considered missing. If column names are used, duplicate column names are not allowed.
- A boolean valued optional matrix specifying the factorized pdf of the joint multivariate normal distribution of the variables requiring imputation.
A description of the factorized pdf is provided in the details.
The column names of
Mmust match the column names of
X, and the rows names of
Mmust be a subset of the column names in
X, in the same order as in
X. Variables requiring imputation are each associated with a row in
M; the conditional relationship to variables in
Xis indicated by the boolean valued elements of each row vector. A value of
TRUEindicates conditional dependence, likewise a value of
FALSEindicates conditional independence. Because this is a factorized pdf, the variable in the first row of
Mcannot specify a conditional dependence with a variable in a later row of
Mis missing, dependence is assumed between all variables being imputed. No missing values are allowed.
- An optional matrix with the same dimensions of
X, with no missing values. All values of
Xinitshould match those of
X, with the exception of missing values. Values of
Xinitthat share an index with a missing value in
Xare treated as initial imputations. If Xinit is not specified, variable means are used as initial imputations.
- A scalar indicating the number of imputations to return
- A scalar indicating the number of iterations to burn in before returning imputations. Note, that burnIn is the total number of iterations, no thinning is performed until multiple imputation generation starts.
- A scalar that represents the amount of thinning for the MCMC routine. A value of one implies no thinning.
- A logical value identifying if the imputation model should have an intercept.
The ISR algorithm performs Bayesian multivariate normal imputation. This imputation follows two steps, an imputation step and a prediction step.
In the imputation step, the missing values are imputed from a Normal-Inverse-Wishart model with non-informative priors.
In the prediction step, the parameters are estimated using both the observed and imputed values.
Imputation of parameters are done through the conditional factoring of the joint pdf.
A conditional factoring is an expansion of the joint pdf of all
the dependent variables in
X. e.g. Pr(X|Z) = Pr(X1,X2,X3|Z) = Pr(X1,Z) Pr(X2|X1,Z) Pr(X3|X1,X2,Z),
where the right hand side is the fully conditional specification for the dependent variables X1-X3 and independent variable Z.
This function returns a list with two elements:
parama three dimensional array of parameter estimates of the factored pdf. The last dimension is an index for the multiple imputations.
imputeda three dimensional array of
Xwith imputed values, the last dimension is an index for the multiple imputations.
Robbins, M. W., & White, T. K. (2011). Farm commodity payments and imputation in the Agricultural Resource Management Survey. American journal of agricultural economics, DOI: 10.1093/ajae/aaq166.
# simulation parameters set.seed(100) n <- 30 p <- 5 missing <- 10 # generate a covar matrix covarMatrix <- rWishart(1,p+1,diag(p))[,,1] # simulation of variables under the variable relationships U <- chol(covarMatrix) X <- matrix(rnorm(n*p), nrow=n) %*% U # make some data missing X[sample(1:(n*p),size=missing)] <- NA # specify relationships fitMatrix <- matrix( c( # Covar2 CoVar1 Var1 Var2 Var3 # 1. Var1 TRUE, TRUE, FALSE, FALSE, FALSE, # 2. Var2 TRUE, TRUE, FALSE, FALSE, FALSE, # 3. Var3 TRUE, TRUE, TRUE, TRUE, FALSE ),nrow=3,byrow=TRUE) covarList <- c('Covar2', 'CoVar1', 'Var1', 'Var2','Var3') # setup names colnames(fitMatrix) <- covarList rownames(fitMatrix) <- covarList[-1:-2] colnames(X) <- covarList XImputed <- isr(X,fitMatrix)