Trains a GAM using mgcv::gam and validates it.
Input will be used to create a formula of the form:
$$y = s(x_{1}, k) + s(x_{2}, k) + ... + s(x_{n}, k)$$
# S3 method for default
s.GAM(x, y = NULL, x.test = NULL, y.test = NULL,
x.name = NULL, y.name = NULL, data = NULL, data.test = NULL,
k = 6, family = NULL, weights = NULL, ipw = TRUE, ipw.type = 2,
upsample = FALSE, upsample.seed = NULL, method = "REML",
select = FALSE, removeMissingLevels = TRUE, spline.index = NULL,
verbose = TRUE, trace = 0, print.plot = TRUE, plot.fitted = NULL,
plot.predicted = NULL, plot.theme = getOption("rt.fit.theme",
"lightgrid"), na.action = na.exclude, question = NULL,
n.cores = rtCores, outdir = NULL,
save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ...)Numeric vector or matrix / data frame of features i.e. independent variables
Numeric vector of outcome, i.e. dependent variable
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in x
Numeric vector of testing set outcome
Character: Name for feature set
Character: Name for outcome
Integer. Number of bases for smoothing spline
Error distribution and link function. See stats::family
Numeric vector: Weights for cases. For classification, weights takes precedence
over ipw, therefore set weights = NULL if using ipw.
Note: If weight are provided, ipw is not used. Leave NULL if setting ipw = TRUE. Default = NULL
Logical: If TRUE, apply inverse probability weighting (for Classification only).
Note: If weights are provided, ipw is not used. Default = TRUE
Integer 0, 1, 2 1: class.weights as in 0, divided by max(class.weights) 2: class.weights as in 0, divided by min(class.weights) Default = 2
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Caution: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)
Logical: If TRUE, finds factors in x.test that contain levels
not present in x and substitutes with NA. This would result in error otherwise and no
predictions would be made, ending s.GLM prematurely
Logical: If TRUE, print summary to screen.
Integer: If higher than 0, will print more information to the console. Default = 0
Logical: if TRUE, produce plot using mplot3
Takes precedence over plot.fitted and plot.predicted
Logical: if TRUE, plot True (y) vs Fitted
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires x.test and y.test
String: "zero", "dark", "box", "darkbox"
How to handle missing values. See ?na.fail
String: the question you are attempting to answer with this model, in plain language.
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if save.mod is TRUE
Logical. If TRUE, save all output as RDS file in outdir
save.mod is TRUE by default if an outdir is defined. If set to TRUE, and no outdir
is defined, outdir defaults to paste0("./s.", mod.name)
Additional arguments to be passed to mgcv::gam
elevate for external cross-validation
Other Supervised Learning: s.ADABOOST,
s.ADDTREE, s.BART,
s.BAYESGLM, s.BRUTO,
s.C50, s.CART,
s.CTREE, s.DA,
s.ET, s.EVTREE,
s.GAM.formula, s.GAMSEL,
s.GAM, s.GBM3,
s.GBM, s.GLMNET,
s.GLM, s.GLS,
s.H2ODL, s.H2OGBM,
s.H2ORF, s.IRF,
s.KNN, s.LDA,
s.LM, s.MARS,
s.MLRF, s.MXN,
s.NBAYES, s.NLA,
s.NLS, s.NW,
s.POLYMARS, s.PPR,
s.PPTREE, s.QDA,
s.QRNN, s.RANGER,
s.RFSRC, s.RF,
s.SGD, s.SPLS,
s.SVM, s.TFN,
s.XGBLIN, s.XGB