Learn R Programming

MonteCarloSEM (version 2.0.0)

MAR.data: Introduces Missing at Random (MAR) Values into Data Sets.

Description

This function introduces missing values under the Missing at Random (MAR) mechanism into previously generated data sets (e.g., those produced by sim.skewed() or sim.normal()). Under MAR, the probability of missingness is associated with other variables in the data set, but not with the variable itself. If the baseV argument is not provided, two random variables (excluding the target variable itself) are selected. Their mean is then used to determine missingness in the target variable. For example, assume a data set with 8 items where missing values are to be introduced for item 2. Two items are randomly selected from items 1, 3, 4, 5, 6, 7, and 8 (e.g., items 5 and 7). Their mean is calculated, sorted, and used as the basis for assigning missingness to the item 2. Following the MAR rule, 90 percents of the missing values are drawn from the highest scores, and the remaining 10 percents are drawn randomly from the rest. For instance, with a sample size of 300 and 20 percents missingness (60 cases), the mean of the selected auxiliary variables is sorted in decreasing order. Missing values are then introduced in 54 cases (90 percents of 60) from the top portion, while 6 cases (10 percents of 60) are drawn randomly from the lower 240 observations. The missing values are represented by NA in the output files. New data sets containing missing values are saved as separate files, preserving the originals. Additionally, a file named "MAR_List.dat" is created, which contains the names of all data sets with MAR missingness.

Usage

MAR.data(
  misg = NULL,
  baseV = NULL,
  perct = 10,
  dataList = "Data_List.dat",
  f.loc
)

Arguments

misg

A numeric vector of 0s and 1s specifying which items will contain missing values. A value of 0 indicates the item will not include missingness, while 1 indicates missing values will be introduced. If omitted, all items are treated as eligible for missingness.

baseV

A list specifying the auxiliary variables on which MAR missingness will be based. This must match to the structure of misg. If not provided, two random variables (excluding the variable itself) are chosen automatically.

perct

The percentage of missingness to be applied (default = 10 percents).

dataList

The file name containing the list of previously generated data sets (e.g., "Data_List.dat"), either created by this package or by external software.

f.loc

The directory path where both the original data sets and the "dataList" file are located.

Author

Fatih Orcan

Examples

Run this code

# Step 1: Generate data sets

fc<-fcors.value(nf=3, cors=c(1,.5,.6,.5,1,.4,.6,.4,1))
fl<-loading.value(nf=3, fl.loads=c(.5,.5,.5,0,0,0,0,0,0,0,0,.6,.6,.6,0,0,0,0,0,0,0,0,.4,.4))
floc<-tempdir()
sim.normal(nd=10, ss=100, fcors=fc, loading<-fl,  f.loc=floc)

# Step 2: Introduce MAR missing values

mis.items<-c(1,0,1,1,0,0,0,0)
bV<-list(c(0,0,0,0,0,0,1,1),NA,c(0,0,0,0,0,1,1,0),c(0,0,0,0,0,1,1,1), NA,NA,NA,NA)
dl<-"Data_List.dat"  # must be located in the working directory
MAR.data(misg = mis.items, baseV=bV, perct = 20, dataList = dl, f.loc=floc )

Run the code above in your browser using DataLab