Learn R Programming

opImputation (version 0.6)

create_diagnostic_missings: Create Diagnostic Missing Values in Data

Description

Introduces additional missing values into a dataset (which may already contain missings) using various missingness mechanisms: Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR). Missing values are only inserted at positions that currently contain actual values (non-NA).

Usage

create_diagnostic_missings(
  x,
  Prob = 0.1,
  mnarity = 0,
  mnarshape = 1,
  lowOnly = FALSE,
  seed,
  maxAttempts = 1000
)

Value

A list with two elements:

toDelete

List of row indices where values were set to missing, one vector per column

missData

Data frame with introduced missing values

Arguments

x

Data frame or matrix with numeric data. May contain existing missing values

Prob

Numeric between 0 and 1. Proportion of non-missing values to set as missing (default: 0.1)

mnarity

Numeric between 0 and 1. Proportion of MNAR (vs MCAR/MAR) missingness (default: 0)

mnarshape

Numeric >= 1. Shape parameter for MNAR probability distribution (default: 1)

lowOnly

Logical. If TRUE, only creates missings for low values in MNAR case (default: FALSE)

seed

Integer. Random seed for reproducibility (default: 42)

maxAttempts

Integer. Maximum number of attempts to generate valid missing pattern (default: 100)

Details

The function creates missing values using a combination of mechanisms:

  • MCAR: Random missingness independent of data values (controlled by 1-mnarity)

  • MAR/MNAR: Value-dependent missingness (controlled by mnarity)

The shape of the MNAR probability distribution is controlled by mnarshape. When lowOnly = TRUE, MNAR mechanism targets only low values; otherwise it targets extreme values (both high and low).

The function ensures that no row ends up with all values missing by excluding positions from the sampling pool that would create completely missing rows.

Examples

Run this code
# Create 10% MCAR missings
result <- create_diagnostic_missings(
  x = iris[,1:4],
  Prob = 0.1,
  mnarity = 0
)

# Create 20% missings with 50% MNAR targeting low values
result <- create_diagnostic_missings(
  x = iris[,1:4],
  Prob = 0.2,
  mnarity = 0.5,
  lowOnly = TRUE
)

Run the code above in your browser using DataLab