
Last chance! 50% off unlimited learning
Sale ends in
Substitute features based on vectorized one-to-one matching for lemmatization or user-defined stemming.
dfm_replace(x, pattern, replacement = NULL, case_insensitive = TRUE,
verbose = quanteda_options("verbose"))
dfm whose features will be replaced
a character vector or dictionary. See pattern for more details.
if pattern
is a character vector, then
replacement
must be character vector of equal length, for a 1:1
match. If pattern
is a dictionary, then replacement
should not be used.
ignore case when matching, if TRUE
print status messages if TRUE
# NOT RUN {
mydfm <- dfm(data_corpus_irishbudget2010)
# lemmatization
infle <- c("foci", "focus", "focused", "focuses", "focusing", "focussed", "focusses")
lemma <- rep("focus", length(infle))
mydfm2 <- dfm_replace(mydfm, infle, lemma)
featnames(dfm_select(mydfm2, infle))
# stemming
feat <- featnames(mydfm)
stem <- char_wordstem(feat, "porter")
mydfm3 <- dfm_replace(mydfm, feat, stem, case_insensitive = FALSE)
identical(mydfm3, dfm_wordstem(mydfm, "porter"))
# }
Run the code above in your browser using DataLab