fitTsfmCreates a list of control parameters for fitTsfm.
All control parameters that are not passed to this function are set to
default values. This function is meant for internal use only!!
fitTsfm.control(
decay = 0.95,
weights,
model = TRUE,
x = FALSE,
y = FALSE,
qr = TRUE,
nrep = NULL,
bb = 0.5,
efficiency = 0.95,
family = "mopt",
tuning.psi,
tuning.chi,
compute.rd = FALSE,
corr.b = TRUE,
split.type = "f",
initial = "S",
max.it = 100,
refine.tol = 1e-07,
rel.tol = 1e-07,
refine.PY = 10,
solve.tol = 1e-07,
trace.lev = 0,
psc_keep = 0.5,
resid_keep_method = "threshold",
resid_keep_thresh = 2,
resid_keep_prop = 0.2,
py_maxit = 20,
py_eps = 1e-05,
mscale_maxit = 50,
mscale_tol = 1e-06,
mscale_rho_fun = "bisquare",
scope,
scale,
direction,
steps = 1000,
k = 2,
nvmin = 1,
nvmax = 8,
force.in = NULL,
force.out = NULL,
method,
really.big = FALSE,
type,
normalize = TRUE,
eps = .Machine$double.eps,
max.steps,
plot.it = FALSE,
lars.criterion = "Cp",
K = 10
)A list of the above components. This is only meant to be used by
fitTsfm.
a scalar in (0, 1] to specify the decay factor for "DLS". Default is 0.95.
an optional vector of weights to be used in the fitting
process for fit.method="LS","Robust", or
variable.selection="subsets". Should be NULL or a numeric
vector. The length of weights must be the same as the number of
observations. The weights must be nonnegative and it is strongly
recommended that they be strictly positive.
logicals passed to lm for
fit.method="LS". If TRUE the corresponding components of the
fit (the model frame, the model matrix, the response, the QR decomposition)
are returned.
the number of random subsamples to be drawn for
fit.method="Robust". If the data set is small and "Exhaustive"
resampling is being used, the value of nrep is ignored.
tuning constant (between 0 and 1/2) for the M-scale used to compute the initial S-estimator. It
determines the robustness (breakdown point) of the resulting MM-estimator, which is
bb. Defaults to 0.5.
desired asymptotic efficiency of the final regression M-estimator. Defaults to 0.85.
string specifying the name of the family of loss function to be used (current valid options are "bisquare", "optimal" and "modopt" from the RobStatTM package). Incomplete entries will be matched to the current valid options.
tuning parameters for the regression M-estimator computed with a rho function
as specified with argument family. If missing, it is computed inside lmrobdet.control to match
the value of efficiency according to the family of rho functions specified in family.
Appropriate values for tuning.psi for a given desired efficiency for Gaussian errors
can be constructed using the functions bisquare, mopt and opt.
tuning constant for the function used to compute the M-scale
used for the initial S-estimator. If missing, it is computed inside lmrobdet.control to match
the value of bb according to the family of rho functions specified in family.
logical value indicating whether robust leverage distances need to be computed.
logical value indicating whether a finite-sample correction should be applied
to the M-scale parameter bb.
determines how categorical and continuous variables are split. See
splitFrame.
string specifying the initial value for the M-step of the MM-estimator. Valid
options are 'S', for an S-estimator and 'MS' for an M-S estimator which is
appropriate when there are categorical explanatory variables in the model.
maximum number of IRWLS iterations for the MM-estimator
relative convergence tolerance for the S-estimator
relative convergence tolerance for the IRWLS iterations for the MM-estimator
number of refinement steps for the Pen~a-Yohai candidates
relative tolerance for inversion
positive values (increasingly) provide details on the progress of the MM-algorithm
For pyinit, proportion of observations to remove based on PSCs. The effective proportion of removed
observations is adjusted according to the sample size to be prosac*(1-p/n). See pyinit.
For pyinit, how to clean the data based on large residuals. If
"threshold", all observations with scaled residuals larger than C.res will
be removed, if "proportion", observations with the largest prop residuals will
be removed. See pyinit.
See parameter resid_keep_method above. See pyinit.
See parameter resid_keep_method above. See pyinit.
Maximum number of iterations. See pyinit.
Relative tolerance for convergence. See pyinit.
Maximum number of iterations for the M-scale algorithm. See pyinit.
Convergence tolerance for the M-scale algorithm. See pyinit.
String indicating the loss function used for the M-scale. See pyinit.
defines the range of models examined in the "stepwise"
search. This should be either a single formula, or a list containing
components upper and lower, both formulae. See
step for how to specify the formulae and usage.
optional parameter for variable.selection="stepwise".
The argument is passed to step or
step.lmrobdetMM as appropriate.
the mode of "stepwise" search, can be one of "both",
"backward", or "forward", with a default of "both". If the scope
argument is missing the default for direction is "backward".
the maximum number of steps to be considered for
"stepwise". Default is 1000 (essentially as many as required). It is
typically used to stop the process early.
the multiple of the number of degrees of freedom used for the
penalty in "stepwise". Only k = 2 gives the genuine AIC.
k = log(n) is sometimes referred to as BIC or SBC. Default is 2.
minimum size of subsets to examine for "subsets".
Default is 1.
maximum size of subsets to examine for "subsets".
Default is 8.
index to columns of design matrix that should be in all
models for "subsets". Default is NULL.
index to columns of design matrix that should be in no
models for "subsets". Default is NULL.
one of "exhaustive", "forward", "backward" or "seqrep"
(sequential replacement) to specify the type of subset search/selection.
Required if variable selection="subsets" is chosen. Default is
"exhaustive".
option for "subsets"; Must be TRUE to
perform exhaustive search on more than 50 variables.
option for "lars". One of "lasso", "lar",
"forward.stagewise" or "stepwise". The names can be abbreviated to any
unique substring. Default is "lasso".
option for "lars". If TRUE, each variable is
standardized to have unit L2 norm, otherwise they are left alone. Default
is TRUE.
option for "lars"; An effective zero.
Limit the number of steps taken for "lars"; the
default is 8 * min(m, n-intercept), with m the number of
variables, and n the number of samples. For type="lar" or
type="stepwise", the maximum number of steps is
min(m,n-intercept). For type="lasso" and especially
type="forward.stagewise", there can be many more terms, because
although no more than min(m,n-intercept) variables can be active
during any step, variables are frequently droppped and added as the
algorithm proceeds. Although the default usually guarantees that the
algorithm has proceeded to the saturated fit, users should check.
option to plot the output for cv.lars.
Default is FALSE.
an option to assess model selection for the
"lars" method; one of "Cp" or "cv". See details. Default is "Cp".
number of folds for computing the K-fold cross-validated mean
squared prediction error for "lars". Default is 10.
If positive (or, not FALSE), info is printed during the
running of step, lars or
cv.lars as relevant. Larger values may give more
detailed information. Default is FALSE.
Sangeetha Srinivasan
This control function is used to process optional arguments passed
via ... to fitTsfm. These arguments are validated and defaults
are set if necessary before being passed internally to one of the following
functions: lm, lmrobdetMM,
step, regsubsets,
lars and cv.lars. See their
respective help files for more details. The arguments to each of these
functions are listed above in approximately the same order for user
convenience.
The scalar decay is used by fitTsfm to compute
exponentially decaying weights for fit.method="DLS". Alternately, one
can directly specify weights, a weights vector, to be used with
"LS" or "Robust". Especially when fitting multiple assets, care should be
taken to ensure that the length of the weights vector matches the number of
observations (excluding cases ignored due to NAs).
lars.criterion selects the criterion (one of "Cp" or "cv") to
determine the best fitted model for variable.selection="lars". The
"Cp" statistic (defined in page 17 of Efron et al. (2004)) is calculated
using summary.lars. While, "cv" computes the K-fold
cross-validated mean squared prediction error using
cv.lars.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of statistics, 32(2), 407-499.
# \donttest{
# check argument list passed by fitTsfm.control
tsfm.ctrl <- fitTsfm.control(method="exhaustive", nvmin=2)
print(tsfm.ctrl)
# }
# used internally by fitTsfm in the example below
# load data
data(managers, package = 'PerformanceAnalytics')
# Make syntactically valid column names
colnames(managers)
colnames(managers) <- make.names( colnames(managers))
colnames(managers)
fit <- fitTsfm(asset.names=colnames(managers[,(1:6)]),
factor.names=colnames(managers[,(7:9)]),
data=managers, variable.selection="subsets",
method="exhaustive", nvmin=2)
Run the code above in your browser using DataLab