- text
Character vector or list.
Text in a vector or list data format
- comparison_text
Character vector or list.
Text in a vector or list data format
- transformer
Character.
Specific sentence similarity transformer
to be used.
Defaults to "all_minilm_l6" (see huggingface)
Also allows any sentence similarity models with a pipeline
from huggingface
to be used by using the specified name (e.g., "typeform/distilbert-base-uncased-mnli"; see Examples)
- device
Character.
Whether to use CPU or GPU for inference.
Defaults to "auto" which will use
GPU over CPU (if CUDA-capable GPU is setup).
Set to "cpu" to perform over CPU
- preprocess
Boolean.
Should basic preprocessing be applied?
Includes making lowercase, keeping only alphanumeric characters,
removing escape characters, removing repeated characters,
and removing white space.
Defaults to FALSE.
Transformers generally are OK without preprocessing and handle
many of these functions internally, so setting to TRUE
will not change performance much
- keep_in_env
Boolean.
Whether the classifier should be kept in your global environment.
Defaults to TRUE.
By keeping the classifier in your environment, you can skip
re-loading the classifier every time you run this function.
TRUE is recommended
- envir
Numeric.
Environment for the classifier to be saved for repeated use.
Defaults to the global environment