- x
Word embeddings from textEmbed (or textEmbedLayerAggregation).
If several word embedding are provided in a list they will be concatenated.
- y
Numeric variable to predict.
- sample_percents
(numeric) Numeric vector that specifies the percentages of the total number of
data points to include in each sample (default = c(25,50,75,100), i.e., correlations are evaluated
for 25
each new sample.
- handle_word_embeddings
Determine whether to use a list of word embeddings or an individual
word_embedding (default = "individually", also "concatenate"). If a list of word embeddings are
provided, then they will be concatenated.
- n_cross_val
(numeric) Value that determines the number of times to repeat the cross-validation (i.e., number of tests).
(default = 1, i.e., cross-validation is only performed once). Warning: The training process gets
proportionately slower to the number of cross-validations, resulting in a time complexity that increases
with a factor of n (n cross-validations).
- sampling_strategy
Sample a "random" sample for each subset from all data or sample a "subset" from the
larger subsets (i.e., each subset contain the same data).
- use_same_penalty_mixture
If TRUE it only searches the penalty and mixture search grid once, and then use the same
thereafter; if FALSE, it searches the grid every time.
- model
Type of model. Default is "regression"; see also "logistic" and "multinomial" for classification.
- penalty
(numeric) Hyper parameter that is tuned (default = 10^seq(-16,16)).
- mixture
A number between 0 and 1 (inclusive) that reflects the proportion of L1 regularization
(i.e. lasso) in the model (for more information see the linear_reg-function in the parsnip-package).
When mixture = 1, it is a pure lasso model while mixture = 0 indicates that ridge regression is being
used (specific engines only).
- seed
(numeric) Set different seed (default = 2024).
- ...
Additional parameters from textTrainRegression.