sparseR_prep: Preprocess & create a model matrix with interactions + polynomials

Description

Preprocess & create a model matrix with interactions + polynomials

Usage

sparseR_prep(
  formula,
  data,
  k = 1,
  poly = 1,
  pre_proc_opts = c("knnImpute", "scale", "center", "otherbin", "none"),
  ia_formula = NULL,
  filter = c("nzv", "zv"),
  extra_opts = list(),
  family = "gaussian"
)

Value

an object of class recipe; see recipes::recipe()

Arguments

formula: A formula of the main effects + outcome of the model
data: A required data frame or tibble containing the variables in formula
k: Maximum order of interactions to numeric variables
poly: the maximum order of polynomials to consider
pre_proc_opts: A character vector specifying methods for preprocessing (see details)
ia_formula: formula to be passed to step_interact (for interactions, see details)
filter: which methods should be used to filter out variables with (near) zero variance? (see details)
extra_opts: extra options to be used for preprocessing
family: family passed from sparseR

Details

The pre_proc_opts acts as a wrapper for the corresponding procedures in the recipes package. The currently supported options that can be passed to pre_proc_opts are: knnImpute: Should k-nearest-neighbors be performed (if necessary?) scale: Should variables be scaled prior to creating interactions (does not scale factor variables or dummy variables) center: Should variables be centered (will not center factor variables or dummy variables ) otherbin:

ia_formula will by default interact all variables with each other up to order k. If specified, ia_formula will be passed as the terms argument to recipes::step_interact, so the help documentation for that function can be investigated for further assistance in specifying specific interactions.

The methods specified in filter are important; filtering is necessary to cut down on extraneous polynomials and interactions (in cases where they really don't make sense). This is true, for instance, when using dummy variables in polynomials , or when using interactions of dummy variables that relate to the same categorical variable.

Data Engineering and BI courses are free this week!

Description

Usage

Value

Arguments

Details