hide_values: Missing Values Generation

Description

A convenient function that randomly introduces missing values to an at-least-bivariate data set. The user can specify either the proportion of observations that contain some missing values or the exact number of observations that contain some missing values. Note that the function does not guarantee that underlying missing-data mechanism to be missing at random (MAR).

Usage

hide_values(X, prop_cases = 0.1, n_cases = NULL)

Value

The orginal \(n\) by \(d\) matrix or data frame with missing values.

Arguments

X: An \(n\) by \(d\) matrix or data frame where \(n\) is the number of observations and \(d\) is the number of columns or variables. X must have at least 2 rows and 2 columns.
prop_cases: (optional) Proportion of observations that contain some missing values. prop_cases must be a number in \((0, 1)\). prop_cases = 0.1 by default, but will be ignored if n_cases is specified.
n_cases: (optional) Number of observations that contain some missing values. n_cases must be an integer ranging from 1 to nrow(X) - 1.

Details

If subject to missingness, an observation can have at least 1 and at most ncol(X) - 1 missing values. Depending on the data set, it is not guaranteed that the resulting matrix will have the number of rows with missing values matches the specified proportion.

Examples

Run this code

set.seed(1234)

hide_values(iris[1:4])
hide_values(iris[1:4], prop_cases = 0.5)
hide_values(iris[1:4], n_cases = 80)

Run the code above in your browser using DataLab