This function uses Hunspell Stemmer to stem a vector of words. It uses the (Portuguese Brazilian) dictionary by default, and unlike hunspell::hunspell_stem it returns only one stem per word.
stem_modified_hunspell(words, complete = TRUE)
character vector of words to be stemmed
wheter words must be completed or not (T)
Then it uses the rslp stemmer in the hunspell stemmed result.
As hunspell_stem can return a list of stems for each word, the function takes the stems that appears the most in the vector for each word.
# NOT RUN {
words <- c("gostou", "gosto", "gostaram")
ptstem:::stem_modified_hunspell(words)
# }
Run the code above in your browser using DataLab