Learn R Programming

stremr (version 0.4)

RegressionClass: R6 class that defines regression models evaluating P(sA|sW), for summary measures (sW,sA)

Description

This R6 class defines fields and methods that controls all the parameters for non-parametric modeling and estimation of multivariate joint conditional probability model P(sA|sW) for summary measures (sA,sW). Note that sA can be multivariate and any component of sA[j] can be either binary, categorical or continuous. The joint probability for P(sA|sA) = P(sA[1],...,sA[k]|sA) is first factorized as P(sA[1]|sA) * P(sA[2]|sA, sA[1]) * ... * P(sA[k]|sA, sA[1],...,sA[k-1]), where each of these conditional probability models is defined by a new instance of a GenericModel class (and a corresponding instance of the RegressionClass class). If sA[j] is binary, the conditional probability P(sA[j]|sW,sA[1],...,sA[j-1]) is evaluated via logistic regression model. When sA[j] is continuous (or categorical), its estimation will be controlled by a new instance of the ContinModel class (or the CategorModel class), as well as the accompanying new instance of the RegressionClass class. The range of continuous sA[j] will be fist partitioned into K bins and the corresponding K bin indicators (B_1,...,B_K), with K new instances of GenericModel class, each instance defining a single logistic regression model for one binary bin indicator outcome B_j and predictors (sW, sA[1],...,sA[k-1]). Thus, the first instance of RegressionClass and GenericModel classes will automatically spawn recursive calls to new instances of these classes until the entire tree of binary logistic regressions that defines the joint probability P(sA|sW) is build.

Usage

RegressionClass

Arguments

Format

An R6Class generator object

Methods

Active Bindings

Details

  • sep_predvars_sets - Logical indicating the type of regression to run, if TRUE fit the joint P(outvar|predvars) (default), More specifically, if FALSE (default), use the same predictors in predvars (vector of names) for all nodes in outvar; when TRUE uses separate sets in predvars (must be a named list of character vectors) for fitting each node in outvar.
  • outvar.class - Character vector indicating a class of each outcome var: bin / cont / cat.
  • outvar - Character vector of regression outcome variable names.
  • predvars - Either a pooled character vector of all predictors (sW) or a vector of regression-specific predictor names. When sep_predvars_sets=TRUE, this must be a named list of predictor names, the list names corresponding to each node name in outvar, and each list item being a vector specifying the regression predictors for a specific outcome in outvar.
  • reg_hazard - Logical, if TRUE, the joint probability model P(outvar | predvars) is factorized as \prod_jP(outvar[j] | predvars) for each j outvar (for fitting hazard).
  • subset_vars - Subset variables (later evaluated to logical vector based on non-missing (!is.na()) values of these variables).
  • subset_exprs - Subset expressions (later evaluated to logical vector in the envir of the data).
  • ReplMisVal0 - Logical, if TRUE all gvars$misval among predicators are replaced with with gvars$misXreplace (0).
  • nbins - Integer number of bins used for a continuous outvar, the intervals are defined inside ContinModel$new() and then saved in this field.
  • bin_nms - Character vector of column names for bin indicators.
  • useglm - Logical, if TRUE then fit the logistic regression model using glm.fit, if FALSE use speedglm.wfit.. regressions (requires registering back-end cluster prior to calling the fit/predict functions)..
  • bin_bymass - Logical, for continuous outvar, create bin cutoffs based on equal mass distribution.
  • bin_bydhist - Logical, if TRUE, use dhist approach for bin definitions. See Denby and Mallows "Variations on the Histogram" (2009)) for more..
  • max_nperbin - Integer, maximum number of observations allowed per one bin.
  • pool_cont - Logical, pool binned continuous outvar observations across bins and only fit only regression model across all bins (adding bin_ID as an extra covaraite)..
  • outvars_to_pool - Character vector of names of the binned continuous outvars, should match bin_nms.
  • intrvls.width - Named numeric vector of bin-widths (bw_j : j=1,...,M) for each each bin in self$intrvls. When sA is not continuous, intrvls.width IS SET TO 1. When sA is continuous and this variable intrvls.width is not here, the intervals are determined inside ContinModel$new() and are assigned to this variable as a list, with names(intrvls.width) <- reg$bin_nms. Can be queried by BinaryOutcomeModel$predictAeqa() as: intrvls.width[outvar].
  • intrvls - Numeric vector of cutoffs defining the bins or a named list of numeric intervals for length(self$outvar) > 1.
  • cat.levels - Numeric vector of all unique values in categorical outcome variable. Set by CategorModel constructor.