This function extracts entities from text and optionally assigns them to specific semantic categories based on dictionaries.
extract_entities(
text_data,
text_column = "abstract",
dictionary = NULL,
case_sensitive = FALSE,
overlap_strategy = c("priority", "all", "longest"),
sanitize_dict = TRUE
)A data frame with extracted entities, their types, and positions.
A data frame containing article text data.
Name of the column containing text to process.
Combined dictionary or list of dictionaries for entity extraction.
Logical. If TRUE, matching is case-sensitive.
How to handle terms that match multiple dictionaries: "priority", "all", or "longest".
Logical. If TRUE, sanitizes the dictionary before extraction.