- input
The input data frame/data table/tibble. This should contain one
or more OT tableaux consisting of mappings between underlying and surface
forms with observed frequency and violation profiles. Constraint violations
must be numeric.
For an example of the data frame format, see inst/extdata/sample_data_frame.csv.
You can read this file into a data frame using read.csv or into a tibble
using dplyr::read_csv.
This function also supports the legacy OTSoft file format. You can use this
format by passing in a file path string to the OTSoft file rather than a
data frame.
For examples of OTSoft format, see inst/extdata/sample_data_file.txt.
- k
The number of folds to use in cross-validation.
- mu_values
A vector or list of mu bias parameters to use in
cross-validation. Parameters may either be scalars, in which case the
same mu parameter will be applied to every constraint, or vectors/lists
containing a separate mu bias parameter for each constraint.
- sigma_values
A vector or list of sigma bias parameters to use in
cross-validation. Parameters may either be scalars, in which case the
same sigma parameter will be applied to every constraint, or vectors/lists
containing a separate sigma bias parameter for each constraint.
- grid_search
(optional) If TRUE, the Cartesian product of the values
in mu_values
and sigma_values
will be validated. For example, if
mu_values = c(0, 1)
and sigma_values = c(0.1, 1)
, cross-validation will
be done on the mu/sigma pairs (0, 0.1), (0, 1), (1, 0.1), (1, 1)
. If
FALSE (default), cross-validation will be done on each pair of values at
the same indices in mu_values
and sigma_values
. For example, if
mu_values = c(0, 1)
and sigma_values = c(0.1, 1)
, cross-validation will
be done on the mu/sigma pairs (0, 0.1), (1, 1)
.
- output_path
(optional) A string specifying the path to a file to
which the cross-validation results will be saved. If the file exists it
will be overwritten. If this argument isn't provided, the output will not
be written to a file.
- out_sep
(optional) The delimiter used in the output files.
Defaults to tabs.
- control_params
(optional) A named list of control parameters that
will be passed to the optim function. See the documentation
of that function for details. Note that some parameter settings may
interfere with optimization. The parameter fnscale
will be overwritten
with -1
if specified, since this must be treated as a maximization
problem.
- upper_bound
(optional) The maximum value for constraint weights.
Defaults to 100.
- encoding
(optional) The character encoding of the input file. Defaults
to "unknown".
- model_name
(optional) A name for the model. If not provided, the file
name will be used if the input is a file path. If the input is a data frame
the name of the variable will be used.
- allow_negative_weights
(optional) Whether the optimizer should allow
negative weights. Defaults to FALSE.