tmlenet (version 0.1.0)

RegressionClass: R6 class that defines regression models evaluating P(sA|sW), for summary measures (sW,sA)

Description

This R6 class defines fields and methods that controls all the parameters for non-parametric modeling and estimation of multivariate joint conditional probability model P(sA|sW) for summary measures (sA,sW). Note that sA can be multivariate and any component of sA[j] can be either binary, categorical or continuous. The joint probability for P(sA|sA) = P(sA[1],...,sA[k]|sA) is first factorized as P(sA[1]|sA) * P(sA[2]|sA, sA[1]) * ... * P(sA[k]|sA, sA[1],...,sA[k-1]), where each of these conditional probability models is defined by a new instance of a SummariesModel class (and a corresponding instance of the RegressionClass class). If sA[j] is binary, the conditional probability P(sA[j]|sW,sA[1],...,sA[j-1]) is evaluated via logistic regression model. When sA[j] is continuous (or categorical), its estimation will be controlled by a new instance of the ContinSummaryModel class (or the CategorSummaryModel class), as well as the accompanying new instance of the RegressionClass class. The range of continuous sA[j] will be fist partitioned into K bins and the corresponding K bin indicators (B_1,...,B_K), with K new instances of SummariesModel class, each instance defining a single logistic regression model for one binary bin indicator outcome B_j and predictors (sW, sA[1],...,sA[k-1]). Thus, the first instance of RegressionClass and SummariesModel classes will automatically spawn recursive calls to new instances of these classes until the entire tree of binary logistic regressions that defines the joint probability P(sA|sW) is build.

Usage

RegressionClass

Arguments

Format

An R6Class generator object

Methods

new(outvar.class = gvars$sVartypes$bin, outvar, predvars, subset, intrvls, ReplMisVal0 = TRUE, useglm = getopt("useglm"), parfit = getopt("parfit"), nbins = getopt("nbins"), bin_bymass = getopt("bin.method") bin_bydhist = getopt("bin.method") max_nperbin = getopt("maxNperBin"), pool_cont = getopt("poolContinVar")
Uses the arguments to instantiate an object of R6 class and define the future regression model.
ChangeManyToOneRegresssion(k_i, reg)
Take a clone of a parent RegressionClass (reg) for length(self$outvar) regressions and set self to a single univariate k_i regression for outcome self$outvar[[k_i]].
ChangeOneToManyRegresssions(regs_list)
Take the clone of a parent RegressionClass for univariate (continuous outvar) regression and set self to length(regs_list) bin indicator outcome regressions.
resetS3class()
...

Active Bindings

S3class
...
get.reg
...

Details

  • outvar.class - Character vector indicating a class of each outcome var: bin / cont / cat.
  • outvar - Character vector of regression outcome variable names.
  • predvars - Either a pool of all character predictors (sW) or regression-specific predictor names.
  • reg_hazard - Logical, if TRUE, the joint probability model P(outvar | predvars) is factorized as \prod_jP(outvar[j] | predvars) for each j outvar (for fitting hazard).
  • subset - Subset expression (later evaluated to logical vector in the envir of the data).
  • ReplMisVal0 - Logical, if TRUE all gvars$misval among predicators are replaced with with gvars$misXreplace (0).
  • nbins - Integer number of bins used for a continuous outvar, the intervals are defined inside ContinSummaryModel$new() and then saved in this field.
  • bin_nms - Character vector of column names for bin indicators.
  • useglm - Logical, if TRUE then fit the logistic regression model using glm.fit, if FALSE use speedglm.wfit..
  • parfit - Logical, if TRUE then use parallel foreach::foreach loop to fit and predict binary logistic regressions (requires registering back-end cluster prior to calling the fit/predict functions)..
  • bin_bymass - Logical, for continuous outvar, create bin cutoffs based on equal mass distribution.
  • bin_bydhist - Logical, if TRUE, use dhist approach for bin definitions. See Denby and Mallows "Variations on the Histogram" (2009)) for more..
  • max_nperbin - Integer, maximum number of observations allowed per one bin.
  • pool_cont - Logical, pool binned continuous outvar observations across bins and only fit only regression model across all bins (adding bin_ID as an extra covaraite)..
  • outvars_to_pool - Character vector of names of the binned continuous outvars, should match bin_nms.
  • intrvls.width - Named numeric vector of bin-widths (bw_j : j=1,...,M) for each each bin in self$intrvls. When sA is not continuous, intrvls.width IS SET TO 1. When sA is continuous and this variable intrvls.width is not here, the intervals are determined inside ContinSummaryModel$new() and are assigned to this variable as a list, with names(intrvls.width) <- reg$bin_nms. Can be queried by BinOutModel$predictAeqa() as: intrvls.width[outvar].
  • intrvls - Numeric vector of cutoffs defining the bins or a named list of numeric intervals for length(self$outvar) > 1.
  • cat.levels - Numeric vector of all unique values in categorical outcome variable. Set by CategorSummaryModel constructor.