Learn R Programming

miceMNAR (version 1.0.2)

MNARargument: Function providing modified arguments for imputation of Missing Not At Random (MNAR) outcomes using mice() function of the 'mice' package

Description

Imputation models for Missing Not At Random (MNAR) binary or continuous outcomes develloped in this package use sample selection models. It is necessary, inside the imputation model, to specify a selection (i.e. missing data mechanism) and an outcome equation. The previous could be the model of interest (i.e. the post-imputation analysis model).

MNARargument adaptes mice() arguments:

  1. data: Indicator of MNAR outcome missingness is included

  2. method: For the MNAR outcome (varMNAR), MNAR imputation model is specified

  3. predictorMatrix is modified to include MNAR indicator of missingness in other variable imputation model

Finally two new arguments are provided: JointModelEq, defining selection and outcome equation of the sample selection model; and control only for internal use.

The procedure is the following:

  1. Use generate_JointModelEq() to construct an empty matrix of variable names allowing to specify selection and outcome equation

  2. Fulfill the previous empty matrix adequately to selection and outcome equation specification of the sample selection model

  3. Generate an object using MNARargument() function

  4. Include in the mice() function the five arguments of the previous object generated by MNARargument()

Usage

MNARargument(data, method = NULL, predictorMatrix = NULL, varMNAR, JointModelEq = NULL)

Arguments

data

The dataset used for classical mice() and additional variables necessary for MNAR imputation models.

method

The mice() method argument.

predictorMatrix

The mice() predictorMatrix argument.

varMNAR

The name of MNAR outcome to be imputed.

JointModelEq

Matrix indicating variables included in selection and outcome equations of MNAR outcome imputation models.

Value

data_mod

Modified dataset including indicator of missingness for MNAR outcomes. Indicators of missingness are coded as "ind_" adding the name of MNAR outcomes.

method

Modified mice() method argument using mice.impute.hecknorm() and mice.impute.heckprob() as imputation methods respectively for continuous and binary outcomes.

predictorMatrix

Modified mice() predictorMatrix argument including indicator of MNAR outcomes missingness as predictors for MAR covariates.

JointModelEq

For internal use: Modified JointModelEq entry argument.

control

For internal use: MNAR outcomes.

Warning

This package is only validated for the imputation of MNAR outcome. However, it is implemented to impute several MNAR variables in the same process. Such implementation must be realised carefully.

Details

Be careful to not define the same selection and outcome equations for MNAR imputation models. A constraint of the sample selection model implies the inclusion of different sets of covariates, which may or not be nested in the selection equation and the outcome equation, to avoid collinearity issues. It has been recommended to include at least a supplementary variable in the selection equation. This variable should be known to be unlinked directly to the outcome.

References

Galimard, J.E., Chevret, S., Curis, E., and Resche-Rigon, M. (2018). Heckman imputation models for binary or continuous MNAR missing outcomes and MAR missing predictors. BMC Medical Research Methodology (In press). Galimard, J.-E., Chevret, S., Protopopescu, C., and Resche-Rigon, M. (2016) A multiple imputation approach for MNAR mechanisms compatible with Heckman's model. Statistics In Medicine, 35: 2907-2920. doi:10.1002/sim.6902.

See Also

mice copulaSampleSel SemiParBIV hiv selection generate_JointModelEq

Examples

Run this code
# NOT RUN {
require(GJRM)
require(mvtnorm)
require(pbivnorm)
require(sampleSelection)

# Import dataset with a suspected MNAR mechanism
data("hiv") 

# We select only one region (lusuka) and 5 variables
lusuka <- hiv[hiv$region==5,c("hiv", "age", "marital", "condom", "smoke")]

# Categorical variables have to be recoded as factor
lusuka$hiv <- as.factor(lusuka$hiv)

#############################################
#### Missing data only on a binary outcome ##
#############################################

# Specify a selection (missing data mechanism) and an outcome equation (analyse model)

# Generate an empty matrix

JointModelEq <- generate_JointModelEq(data=lusuka,varMNAR = "hiv")

# Fill in with 1 for variable included in equations
JointModelEq[,"hiv_var_sel"] <- c(0,1,1,1,1)
JointModelEq[,"hiv_var_out"] <- c(0,1,1,1,0)

# Generation of argument for MNAR imputation model in "mice()" function
arg <- MNARargument(data=lusuka,varMNAR="hiv",JointModelEq=JointModelEq)

# Imputation using mice() function
# Values returned have to be included in the "mice()" function as argument:

imputation <- mice(data = arg$data_mod,
                 method = arg$method,
                 predictorMatrix = arg$predictorMatrix,
                 JointModelEq=arg$JointModelEq,
                 control=arg$control,
                 maxit=1,m=5)

# Because of missing data only on one variable, fix maxit=1

# Estimation on each imputed dataset and pooling               
analysis <- with(imputation,glm(hiv~age+condom+marital,family=binomial(link="probit")))
result <- pool(analysis)
summary(result)

##########################################################
#### Missing data on a binary outcome and one covariate ##
##########################################################

# Generate missing values on the variable "condom" 
# According to a MAR mechanism using a probit model
prob <- pnorm((35.5-lusuka$age)/10.74) # Depending on "age"
lusuka$condom[rbinom(nrow(lusuka),size=1, prob=prob)==0] <- NA

JointModelEq <- generate_JointModelEq(data=lusuka,varMNAR = c("hiv"))
JointModelEq[,"hiv_var_sel"] <- c(0,1,1,1,1)
JointModelEq[,"hiv_var_out"] <- c(0,1,1,1,0)

arg <- MNARargument(data=lusuka,varMNAR=c("hiv"),JointModelEq=JointModelEq)

# }
# NOT RUN {
# Imputation using mice function
imputation <- mice(data = arg$data_mod,
                 method = arg$method,
                 predictorMatrix = arg$predictorMatrix,
                 JointModelEq=arg$JointModelEq,
                 control=arg$control, 
                 maxit=5,m=5)

# As classically, estimation on each imputed datasets and pooling               
analysis <- with(imputation,glm(hiv~age+condom+marital,family=binomial(link="probit")))
result <- pool(analysis)
summary(result)
# }
# NOT RUN {
#################################################
#### Missing data only on a continuous outcome ##
#################################################

# Generation of a simulated dataset with MNAR mechanism on a continuous outcome

X1 <- rnorm(500,0,1)
X2 <- rbinom(500,1,0.5)
X3 <- rnorm(500,1,0.5)
  
errors <- rmvnorm(500,mean=c(0,0),sigma=matrix(c(1,0.3,0.3,1),nrow=2,byrow=TRUE))

Y <- X1+X2+errors[,1]
Ry <- ifelse(0.66+1*X1-0.5*X2+X3+errors[,2]>0,1,0)

Y[Ry==0] <- NA
  
simul_data <- data.frame(Y,X1,X2,X3)

JointModelEq <- generate_JointModelEq(data=simul_data,varMNAR = "Y")

JointModelEq[,"Y_var_sel"] <- c(0,1,1,1)
JointModelEq[,"Y_var_out"] <- c(0,1,1,0)

arg <- MNARargument(data=simul_data,varMNAR="Y",JointModelEq=JointModelEq)

imputation2 <- mice(data = arg$data_mod,
                 method = arg$method,
                 predictorMatrix = arg$predictorMatrix,
                 JointModelEq=arg$JointModelEq,
                 control=arg$control,
                 maxit=1,m=5)

analysis2 <- with(imputation,lm(Y~X1+X2+X3))
result2 <- pool(analysis2)
summary(result2)

#############################
## Using 2-step estimation ##
#############################

arg <- MNARargument(data=simul_data,varMNAR="Y",JointModelEq=JointModelEq)
arg$method["Y"] <- "hecknorm2step"

# }
# NOT RUN {
imputation3 <- mice(data = arg$data_mod,
                 method = arg$method,
                 predictorMatrix = arg$predictorMatrix,
                 JointModelEq=arg$JointModelEq,
                 control=arg$control,
                 maxit=1,m=5)

analysis3 <- with(imputation3,lm(Y~X1+X2+X3))
result3 <- pool(analysis3)
summary(result3)
# }

Run the code above in your browser using DataLab