A convenient function that randomly introduces missing values to an at-least-bivariate
data set. The user can specify either the proportion of observations that contain some
missing values or the exact number of observations that contain some missing values.
Note that the function does not guarantee that underlying missing-data
mechanism to be missing at random (MAR).
Usage
hide_values(X, prop_cases = 0.1, n_cases = NULL)
Value
The orginal \(n\) by \(d\) matrix or data frame with missing values.
Arguments
X
An \(n\) by \(d\) matrix or data frame where \(n\) is the number of
observations and \(d\) is the number of columns or variables. X must
have at least 2 rows and 2 columns.
prop_cases
(optional) Proportion of observations that contain some missing values.
prop_cases must be a number in \((0, 1)\). prop_cases = 0.1
by default, but will be ignored if n_cases is specified.
n_cases
(optional) Number of observations that contain some missing values.
n_cases must be an integer ranging from 1 to nrow(X) - 1.
Details
If subject to missingness, an observation can have at least 1 and at
most ncol(X) - 1 missing values. Depending on the data
set, it is not guaranteed that the resulting matrix will have the number of
rows with missing values matches the specified proportion.