This function implements an improved LSI model that more rigorously filters out non-biomedical terms from the results to ensure clinical relevance. It adds NLP-based validation as an additional layer of filtering.
lsi_model(
term_doc_matrix,
a_term,
n_factors = 100,
n_results = 100,
enforce_biomedical_terms = TRUE,
c_term_types = NULL,
entity_types = NULL,
validation_function = is_valid_biomedical_entity,
min_word_length = 3,
use_nlp = TRUE,
nlp_threshold = 0.7
)A data frame with ranked discovery results.
A term-document matrix.
Character string, the source term (A).
Number of factors to use in LSI.
Maximum number of results to return.
Logical. If TRUE, enforces strict biomedical term filtering.
Character vector of entity types allowed for C terms.
Named vector of entity types (if NULL, will try to detect).
Function to validate biomedical terms.
Minimum word length to include.
Logical. If TRUE, uses NLP-based validation for biomedical terms.
Numeric between 0 and 1. Minimum confidence for NLP validation.