fill_NA_N: `fill_NA_N` function for the multiple imputations purpose

Description

Multiple imputations to fill the missing data. Non missing independent variables are used to approximate a missing observations for a dependent variable. Quantitative models were built under Rcpp packages and the C++ library Armadillo.

Usage

fill_NA_N(
  x,
  model,
  posit_y,
  posit_x,
  w = NULL,
  logreg = FALSE,
  k = 10,
  ridge = 1e-06
)
# S3 method for data.frame
fill_NA_N(
  x,
  model,
  posit_y,
  posit_x,
  w = NULL,
  logreg = FALSE,
  k = 10,
  ridge = 1e-06
)
# S3 method for data.table
fill_NA_N(
  x,
  model,
  posit_y,
  posit_x,
  w = NULL,
  logreg = FALSE,
  k = 10,
  ridge = 1e-06
)
# S3 method for matrix
fill_NA_N(
  x,
  model,
  posit_y,
  posit_x,
  w = NULL,
  logreg = FALSE,
  k = 10,
  ridge = 1e-06
)

Value

load imputations in a numeric/character/factor (similar to the input type) vector format

Arguments

x: a numeric matrix or data.frame/data.table (factor/character/numeric/logical) - variables
model: a character - possible options ("lm_bayes","lm_noise","pmm")
posit_y: an integer/character - a position/name of dependent variable
posit_x: an integer/character vector - positions/names of independent variables
w: a numeric vector - a weighting variable - only positive values, Default: NULL
logreg: a boolean - if dependent variable has log-normal distribution (numeric). If TRUE log-regression is evaluated and then returned exponential of results., Default: FALSE
k: an integer - a number of multiple imputations or for pmm a number of closest points from which a one random value is taken, Default:10
ridge: a numeric - a value added to diagonal elements of the x'x matrix, Default: 1e-6

Methods (by class)

fill_NA_N(data.frame): s3 method for data.frame
fill_NA_N(data.table): S3 method for data.table
fill_NA_N(matrix): S3 method for matrix

Examples

Run this code

library(miceFast)
library(dplyr)
library(data.table)

data(air_miss)

# dplyr: PMM with 20 draws
air_miss %>%
  mutate(Ozone_pmm = fill_NA_N(
    x = ., model = "pmm",
    posit_y = "Ozone", posit_x = c("Solar.R", "Wind", "Temp"),
    k = 20
  ))

# dplyr: lm_noise with weights
air_miss %>%
  mutate(Ozone_imp = fill_NA_N(
    x = ., model = "lm_noise",
    posit_y = "Ozone",
    posit_x = c("Solar.R", "Wind", "Temp"),
    w = .[["weights"]],
    logreg = TRUE, k = 30
  ))

# data.table: PMM grouped
data(air_miss)
setDT(air_miss)
air_miss[, Ozone_pmm := fill_NA_N(
  x = .SD, model = "pmm",
  posit_y = "Ozone",
  posit_x = c("Wind", "Temp", "Intercept"),
  k = 20
), by = .(groups)]

# See the vignette for full examples:
# vignette("miceFast-intro", package = "miceFast")