refine
is a generic function with methods for objects of the classes produced by MSL
. In the up-to-date workflow, it can automatically (1) define new parameters points, (2) add simulations to the reference table for these points, (3) optionally recompute projections, (4) update the inference of the likelihood surface, and (5) provides new point estimates, confidence intervals, and other results of an MSL
call. It can repeat these steps iteratively as controlled by its workflow_design
.
Although it has many control arguments, few of them may be needed in any application. In particular it is designed to use reasonable default controls for the number of iterations, the number of points added in each iteration, and whether to update projections or not, when given only the current fit object as input.
reproject
and recluster
are wrappers for refine(..., ntot=0L)
, updating the object after either recomputing the projections or only re-performing the multivariate gaussian mixture clustering.
# S3 method for SLik
refine(object, method=NULL, ...)
# S3 method for default
refine(
object,
## reference table simulations
Simulate = attr(surfaceData,"Simulate"),
control.Simulate = attr(surfaceData,"control.Simulate"),
newsimuls = NULL,
## CIs
CIs = workflow_design$reftable_sizes[useCI],
useCI = prod(dim(object$logLs))
refine
returns an updated SLik
or SLik_j
object, unless both newsimuls
and Simulate
arguments are NULL, in which case a data frame of parameter points is returned.
an SLik
or SLik_j
object
## reference table simulations
Character string: name of the function used to simulate samples. As it is typically stored in the object this argument does not need to be explicitly given; otherwise this should be the same function provided to add_reftable
, whose documentation details the design requirements. The only meaningful non-default value is NULL
, in which case refine
may return (if newsimuls
is also NULL
) a data frame of parameter points on which to run a simulation function.
A list of arguments of the Simulate
function (see add_simulation
). The default value should be used unless you understand enough of its structure to modify it wisely (e.g., it may contain the path of an executable on one machine and a different path may be specified to refine a fit on another machine).
For the SLik_j
method, a matrix or data frame, with the same parameters and summary statistics as the data
of the original infer_SLik_joint
call.
For other methods, a list
of simulation of distributions of summary statistics, in the same format as for link{add_simulation}
.
If no such list is provided (i.e., if newsimuls
remains NULL
), the function extracted by get_from(object,"Simulate")
is used (it is inherited from the Simulate
argument of add_simulation
through the initial sequence of calls of functions add_simulation
,
infer_logLs
or infer_tailp
, and infer_surface
). If no such function is available, then this function returns parameters for which new distribution should be provided by the user.
## CIs
Boolean, or boolean vector, or numeric (preferably integer) vector: controls to infer bounds of (one-dimensional, profile) confidence intervals. The numeric vector form allows to specify reference table size(s) for which CIs should be computed when these sizes are first reached. TRUE or FALSE will force or inhibit computation in all iterations. Finally (and probably less useful), a boolean vector such as CIs=c(TRUE,FALSE,TRUE)
requests computation of CIs when the number of points cumulatively added reaches the target number of points for the first, third, and any subsequent iterations up to maxit
(this may differ in certain cases from the first, third, and so on, iterations: see Details).
The default for refine
is described in the Details. The default for reproject
is to update the CIs if there are computed ones within the input object
.
whether to perform RMSE computations for inferred confidence interval points.
Intended coverage of confidence intervals
## workflow design
A list structured as the return value of get_workflow_design
. The default value makes reference to elements of the input object's colTypes
element.
Maximum number of iterative refinements (see also precision
argument).
NULL or numeric: control of the total number of simulated samples (one for each new parameter point) to be added to the reference table over the maxit
iterations. See Details for the rules used to determine the number of points added in each iteration. Reasonable default values are defined for ntot
and maxit
(see Details), so that beginners (and ideally, even more advanced users) do not have to find good values.
ntot=0L
may be used to re-generate the projectors or the clustering without augmenting the reference table.
NULL or numeric, for a number of parameter points (excluding replicates and confidence interval points in the primitive workflow), whose likelihood should be computed in each iteration
(see n
argument of sample_volume
). Slightly less intutive alternative to ntot
specification, as there is at least one iteration where the actual number of added points is not the nominal n
(see Details).
n=0L
will have the same effect as ntot=0L
.
## termination conditions
Requested local precision of surface estimation, in terms of prediction standard errors (RMSEs) of both the maximum summary log-likelihood and the likelihood ratio at any CI bound available. Iterations will stop when either maxit
is reached, or if the RMSEs have been computed for the object (see eval_RMSEs
argument) and this precision is reached for the RMSEs.
A given precision on the CI bounds themselves might seem more interesting, but is not well specified by a single precision parameter if the parameters are on widely different scales.
Same usage as for CIs
; controls the eval_RMSEs
argument of MSL
in each iteration. See Details for the default.
The default for reproject
is to update the RMSEs if there are computed ones within the input object
.
## verbosity
A list as shown by the default, or simply a vector of booleans. verbose$most
controls whether to display information about progress and results, except plots; $final
controls whether to plot()
the final object
to show the final likelihood surface. Default is to plot it only in an interactive session and if fewer than three parameters are estimated; $movie
controls whether to plot()
the updated object
in each iteration; verbose$proj
controls the verbose
argument of project.character
; verbose$rparam
controls (cryptic) information about generation of new parameter points; verbose$progress_bars
controls display of some progress bars. If verbose
is an unnamed vector of booleans, they are interpreted as as-many first elements of theverbose
vector, in the order shown by the default.
## projection controls
Same usage as for CIs
; this controls in which iterations the projectors are updated. The default NULL
value is strongly recommended. See Details for further explanations.
A list of arguments for the projection method. By default the methodArgs
of the original project.character
calls are reused over iteration, but elements of the new methodArgs
list will be used to update the original methodArgs
. Note that the updated list becomes the new default for further iterations.
## Likelihood surface modeling
Passed to infer_SLik_joint
: a character string used to control the joint-density estimation method, as documented for that function (see method
instead for equivalent control in primitive workflow). Default is to use to same method as in the the first iteration, but this argument allows a change of method.
Passed to infer_SLik_joint
. The data
in the expression for the default value refers to the data
argument of the latter function.
## parallelisation
A list of arguments for makeCluster
, in addition to makeCluster
's spec
argument which is in most cases best specified by the nb_cores
argument. Cluster arguments allow independent control of parallel computations for the different steps of a refine
iteration (see Details; as a rough but effective summary, use only nb_cores
when the simulations support it, and see the methodArgs
argument if independent control of parallelisation of the projection procedure is needed).
Integer: shortcut for specifying cluster_args$spec
for sample simulation.
NULL or a list with possible elements add_simulation
and logL_method
(the latter for the primitive workflow). These elements should be formatted as the packages
arguments of add_simulation
and infer_logLs
, respectively, wherein they are the additional packages to be loaded on child processes. The effect of the default value of this argument is to pass over successive refine
calls the value stored in the input fit object (itself determined by the latest use of the packages
argument in, e.g., add_simulation
or in previous refine
s).
An environment, passed as the env
argument to add_simulation
. The default value keeps the pre-refine
value over iterations.
NULL or integer, passed to add_simulation
. The default code uses an internal function, .update_seed
, to update it from a previous iteration.
## others
Likelihood ratio threshold used to control the sampling of new points and the selection of points for projections. Do not change it unless you known what you are doing.
For the primitive workflow: (a vector of) suggested method(s) for estimation of smoothing parameters (see method
argument of infer_surface
). The ith element of the vector is
used in the ith iteration, if available; otherwise the last element is used. This argument is not always heeded, in that REML may be used if the suggested method is GCV but it appears to perform poorly. The default for SLikp
and SLikp
objects are "REML"
and "PQL"
, respectively.
A data frame of parameters on which the simulation function get_from(object,"Simulate")
should be called to extend the reference table. Only for programming by expert users, because poorly thought input trypoints
could severely affect the inferences.
for the primitive workflow only: cf this argument in rparam
.
for the primitive workflow only: a data.frame with attributes, usually taken from the object
and thus not specified by user, usable as input for infer_surface
.
Function used to sample new parameter values.
further arguments passed to or from other methods. refine
passes these arguments to the plot
method suitable for the object
.
*
Controls of exploration of parameter space: New parameter points are sampled so as to fill the space of parameters contained in the confidence regions defined by the level
argument, and to surround it by a region sampled proportionally to likelihood.
Each refine
call performs several iterations, these iterations stopping when ntot
points have been added to the simulation table. The target number of points potentially added in each iteration is controlled by the ntot
and maxit
arguments as described below, but fewer points may be actually added in each iteration, and more than maxit
iterations may be needed to add the ntot
points, if in a given iteration too few “good” candidate points are generated according to the internal rules for sampling the parameter region with high likelihood. In that case, the next iteration tries to keep up with the missing points by adding more points than the target number, but if not enough points have been added after maxit
iterations, further iterations will be run.
CIs and RMSEs may be computed in any iteration but the default values of eval_RMSEs
and CIs
are chosen so as to avoid performing these computations too often, particularly when they are expected to be slow. The default implies that the RMSE for the maximum logL will be computed at the end each block of iterations that defines a refine (itself defined to reach to reference table sizes specified by the workflow_design
and its default value). If the reference table is not too large (see default value of useCI
for the precise condition), RMSEs of the logL are also computed at the inferred bounds of profile-based confidence intervals for each parameter.
Although the update_projectors
argument allow similar control of the iterations where projections are updated, it is advised to keep it NULL (default value), so that whether projectors are updated in a given iteration is controlled by default internal rules. Setting it to TRUE
would induce updating whenever any of the target reference table sizes implied by the workflow_design$subblock_sizes
is reached. The default NULL
, as the same effect subject to additional conditions: updating may not be performed when the training set is considered too similar to the one used to compute pre-existing projections, or when the train set includes more samples than the limit define by the global package option upd_proj_subrows_thr
Default values of ntot
and maxit
are controlled by the value of the workflow_design
, which itself has the shown default value, and are distinct for the first vs. subsequent refine
s.
The target number of points in each iteration is also controlled differently for the first vs. subsequent refine
s. This design is motivated by the fact that the likelihood surface is typically poorly inferred in the first refine so that the parameter points sampled then tend to be less relevant than those that can be sampled in later iterations. In the first refine
call, the target number of points increases roughly as powers of two over iterations, to reach ntot
cumulatively after maxit
iterations. The default ntot
is twice the size of the initial reference table, and the default maxit
is 5. The example_reftable
Example illustrates this, where the initial reference table holds 200 simulations, and the default target number of points to be added in 5 iterations by the first refine
call are 25, 25, 50, 100 and 200. In later refine
calls, the target number is ntot/maxit
in each iteration.
*
Independent control of parallelisation may be needed in the different steps, e.g. if the simulations are not easily parallelised whereas the projection method natively handles parallelisation. In the up-to-date workflow with default ranger
projection method, distinct parallelisation controls may be passed to add_reftable
for sample simulations, to project
methods when projections are updated, and to MSL
for RMSE computations (alternatively for the primitive workflow, add_simulation
, infer_logLs
and MSL
are called).
The most explicit way of specifying distinct controls is by a list structured as
cluster_args=list(reftable=list(<makeCluster arguments>),
RMSEs=list(<makeCluster arguments>))
A project=list(num.threads=<.>)
element can be added to this list, providing control of the num.threads
argument of ranger functions. However, this is retained mainly for back compatibility as the methodArgs
argument can now be used to specify the num.threads
.
Simpler arguments may be used and will be interpreted as follows: nb_cores
, if given and not overriden by a spec
argument in cluster_args
(or in sublists of it), will control simulation and projection steps (but not RMSE computation): that is, nb_cores
then gives the number of parallel processes for sample simulation, with additional makeCluster
arguments taken from cluster_args
, but RMSE computations are performed serially. On the other hand, a spec
argument in
cluster_args=list(spec=<.>, <other makeCluster arguments>))
will instead apply the same arguments to both reference table and RMSE computation, overcoming the default effect of nb_cores
in both of them.
## see Note for links to examples.
Run the code above in your browser using DataLab