fitTsfm.control: List of control parameters for `fitTsfm`

Description

Creates a list of control parameters for fitTsfm. All control parameters that are not passed to this function are set to default values. This function is meant for internal use only!!

Usage

fitTsfm.control(
  decay = 0.95,
  weights,
  model = TRUE,
  x = FALSE,
  y = FALSE,
  qr = TRUE,
  nrep = NULL,
  bb = 0.5,
  efficiency = 0.95,
  family = "mopt",
  tuning.psi,
  tuning.chi,
  compute.rd = FALSE,
  corr.b = TRUE,
  split.type = "f",
  initial = "S",
  max.it = 100,
  refine.tol = 1e-07,
  rel.tol = 1e-07,
  refine.PY = 10,
  solve.tol = 1e-07,
  trace.lev = 0,
  psc_keep = 0.5,
  resid_keep_method = "threshold",
  resid_keep_thresh = 2,
  resid_keep_prop = 0.2,
  py_maxit = 20,
  py_eps = 1e-05,
  mscale_maxit = 50,
  mscale_tol = 1e-06,
  mscale_rho_fun = "bisquare",
  scope,
  scale,
  direction,
  steps = 1000,
  k = 2,
  nvmin = 1,
  nvmax = 8,
  force.in = NULL,
  force.out = NULL,
  method,
  really.big = FALSE,
  type,
  normalize = TRUE,
  eps = .Machine$double.eps,
  max.steps,
  plot.it = FALSE,
  lars.criterion = "Cp",
  K = 10
)

Value

A list of the above components. This is only meant to be used by fitTsfm.

Arguments

decay: a scalar in (0, 1] to specify the decay factor for "DLS". Default is 0.95.
weights: an optional vector of weights to be used in the fitting process for fit.method="LS","Robust", or variable.selection="subsets". Should be NULL or a numeric vector. The length of weights must be the same as the number of observations. The weights must be nonnegative and it is strongly recommended that they be strictly positive.
model, x, y, qr: logicals passed to lm for fit.method="LS". If TRUE the corresponding components of the fit (the model frame, the model matrix, the response, the QR decomposition) are returned.
nrep: the number of random subsamples to be drawn for fit.method="Robust". If the data set is small and "Exhaustive" resampling is being used, the value of nrep is ignored.
bb: tuning constant (between 0 and 1/2) for the M-scale used to compute the initial S-estimator. It determines the robustness (breakdown point) of the resulting MM-estimator, which is bb. Defaults to 0.5.
efficiency: desired asymptotic efficiency of the final regression M-estimator. Defaults to 0.85.
family: string specifying the name of the family of loss function to be used (current valid options are "bisquare", "optimal" and "modopt" from the RobStatTM package). Incomplete entries will be matched to the current valid options.
tuning.psi: tuning parameters for the regression M-estimator computed with a rho function as specified with argument family. If missing, it is computed inside lmrobdet.control to match the value of efficiency according to the family of rho functions specified in family. Appropriate values for tuning.psi for a given desired efficiency for Gaussian errors can be constructed using the functions bisquare, mopt and opt.
tuning.chi: tuning constant for the function used to compute the M-scale used for the initial S-estimator. If missing, it is computed inside lmrobdet.control to match the value of bb according to the family of rho functions specified in family.
compute.rd: logical value indicating whether robust leverage distances need to be computed.
corr.b: logical value indicating whether a finite-sample correction should be applied to the M-scale parameter bb.
split.type: determines how categorical and continuous variables are split. See splitFrame.
initial: string specifying the initial value for the M-step of the MM-estimator. Valid options are 'S', for an S-estimator and 'MS' for an M-S estimator which is appropriate when there are categorical explanatory variables in the model.
max.it: maximum number of IRWLS iterations for the MM-estimator
refine.tol: relative convergence tolerance for the S-estimator
rel.tol: relative convergence tolerance for the IRWLS iterations for the MM-estimator
refine.PY: number of refinement steps for the Pen~a-Yohai candidates
solve.tol: relative tolerance for inversion
trace.lev: positive values (increasingly) provide details on the progress of the MM-algorithm
psc_keep: For pyinit, proportion of observations to remove based on PSCs. The effective proportion of removed observations is adjusted according to the sample size to be prosac*(1-p/n). See pyinit.
resid_keep_method: For pyinit, how to clean the data based on large residuals. If "threshold", all observations with scaled residuals larger than C.res will be removed, if "proportion", observations with the largest prop residuals will be removed. See pyinit.
resid_keep_thresh: See parameter resid_keep_method above. See pyinit.
resid_keep_prop: See parameter resid_keep_method above. See pyinit.
py_maxit: Maximum number of iterations. See pyinit.
py_eps: Relative tolerance for convergence. See pyinit.
mscale_maxit: Maximum number of iterations for the M-scale algorithm. See pyinit.
mscale_tol: Convergence tolerance for the M-scale algorithm. See pyinit.
mscale_rho_fun: String indicating the loss function used for the M-scale. See pyinit.
scope: defines the range of models examined in the "stepwise" search. This should be either a single formula, or a list containing components upper and lower, both formulae. See step for how to specify the formulae and usage.
scale: optional parameter for variable.selection="stepwise". The argument is passed to step or step.lmrobdetMM as appropriate.
direction: the mode of "stepwise" search, can be one of "both", "backward", or "forward", with a default of "both". If the scope argument is missing the default for direction is "backward".
steps: the maximum number of steps to be considered for "stepwise". Default is 1000 (essentially as many as required). It is typically used to stop the process early.
k: the multiple of the number of degrees of freedom used for the penalty in "stepwise". Only k = 2 gives the genuine AIC. k = log(n) is sometimes referred to as BIC or SBC. Default is 2.
nvmin: minimum size of subsets to examine for "subsets". Default is 1.
nvmax: maximum size of subsets to examine for "subsets". Default is 8.
force.in: index to columns of design matrix that should be in all models for "subsets". Default is NULL.
force.out: index to columns of design matrix that should be in no models for "subsets". Default is NULL.
method: one of "exhaustive", "forward", "backward" or "seqrep" (sequential replacement) to specify the type of subset search/selection. Required if variable selection="subsets" is chosen. Default is "exhaustive".
really.big: option for "subsets"; Must be TRUE to perform exhaustive search on more than 50 variables.
type: option for "lars". One of "lasso", "lar", "forward.stagewise" or "stepwise". The names can be abbreviated to any unique substring. Default is "lasso".
normalize: option for "lars". If TRUE, each variable is standardized to have unit L2 norm, otherwise they are left alone. Default is TRUE.
eps: option for "lars"; An effective zero.
max.steps: Limit the number of steps taken for "lars"; the default is 8 * min(m, n-intercept), with m the number of variables, and n the number of samples. For type="lar" or type="stepwise", the maximum number of steps is min(m,n-intercept). For type="lasso" and especially type="forward.stagewise", there can be many more terms, because although no more than min(m,n-intercept) variables can be active during any step, variables are frequently droppped and added as the algorithm proceeds. Although the default usually guarantees that the algorithm has proceeded to the saturated fit, users should check.
plot.it: option to plot the output for cv.lars. Default is FALSE.
lars.criterion: an option to assess model selection for the "lars" method; one of "Cp" or "cv". See details. Default is "Cp".
K: number of folds for computing the K-fold cross-validated mean squared prediction error for "lars". Default is 10.
trace: If positive (or, not FALSE), info is printed during the running of step, lars or cv.lars as relevant. Larger values may give more detailed information. Default is FALSE.

Author

Sangeetha Srinivasan

Details

This control function is used to process optional arguments passed via ... to fitTsfm. These arguments are validated and defaults are set if necessary before being passed internally to one of the following functions: lm, lmrobdetMM, step, regsubsets, lars and cv.lars. See their respective help files for more details. The arguments to each of these functions are listed above in approximately the same order for user convenience.

The scalar decay is used by fitTsfm to compute exponentially decaying weights for fit.method="DLS". Alternately, one can directly specify weights, a weights vector, to be used with "LS" or "Robust". Especially when fitting multiple assets, care should be taken to ensure that the length of the weights vector matches the number of observations (excluding cases ignored due to NAs).

lars.criterion selects the criterion (one of "Cp" or "cv") to determine the best fitted model for variable.selection="lars". The "Cp" statistic (defined in page 17 of Efron et al. (2004)) is calculated using summary.lars. While, "cv" computes the K-fold cross-validated mean squared prediction error using cv.lars.

References

Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of statistics, 32(2), 407-499.

Examples

Run this code

# \donttest{
# check argument list passed by fitTsfm.control
tsfm.ctrl <- fitTsfm.control(method="exhaustive", nvmin=2)
print(tsfm.ctrl)
# }

# used internally by fitTsfm in the example below
 # load data
data(managers, package = 'PerformanceAnalytics')
 # Make syntactically valid column names
colnames(managers)
colnames(managers) <- make.names( colnames(managers))
colnames(managers)

fit <- fitTsfm(asset.names=colnames(managers[,(1:6)]),
               factor.names=colnames(managers[,(7:9)]), 
               data=managers, variable.selection="subsets", 
               method="exhaustive", nvmin=2)

Run the code above in your browser using DataLab