htmlParse
and xpathSApply
from the XML
package are used to process HTML files, if necessary. textToWords
is a helper function that simply breaks down a character vector to a vector of words.
detectRareWords(textFile = NULL, wordFrequencyFile = "Dutch", output = c("file", "show", "return"), outputFile = NULL, wordCol = "Word", freqCol = "FREQlemma", textToWordsFunction = "textToWords", encoding = "ASCII", xPathSelector = "/text()", silent = FALSE)
textToWords(characterVector)
file
, the filename to write to should be provided in outputFile
. If show
, the output is shown; and if return
, the output is returned invisibly.
wordFrequencyFile
that contains the words.
wordFrequencyFile
that contains the frequency with which each word occurs.
xpathSApply
is used to extract the content. xPathSelector
specifies which content to extract (the default value extracts all text content).
detectRareWords
return a dataframe (invisibly) if output
contains return
. Otherwise, NULL is returned (invisibly), but the output is printed and/or written to a file depending on the value of output
.textToWords
returns a vector of words.
## Not run:
# detectRareWords(paste('Dit is een tekst om de',
# 'werking van de detectRareWords',
# 'functie te demonstreren.'),
# output='show');
# ## End(Not run)
Run the code above in your browser using DataLab