Looking up word frequencies

This function checks, for each word in a text, how frequently it occurs in a given language. This is useful for eliminating rare words to make a text more accessible to an audience with limited vocabulary. htmlParse and xpathSApply from the XML package are used to process HTML files, if necessary. textToWords is a helper function that simply breaks down a character vector to a vector of words.

detectRareWords(textFile = NULL, wordFrequencyFile = "Dutch", output = c("file", "show", "return"), outputFile = NULL, wordCol = "Word", freqCol = "FREQlemma", textToWordsFunction = "textToWords", encoding = "ASCII", xPathSelector = "/text()", silent = FALSE) textToWords(characterVector)
If NULL, a dialog will be shown that enables users to select a file. If not NULL, this has to be either a filename or a character vector. An HTML file can be provided; this will be parsed using
The file with word frequencies to use. If 'Dutch' or 'Polish', files from the Center for Reading Research ( are downloaded.
How to provide the output, as a character vector. If file, the filename to write to should be provided in outputFile. If show, the output is shown; and if return, the output is returned invisibly.
The name of the file to store the output in.
The name of the column in the wordFrequencyFile that contains the words.
The name of the column in the wordFrequencyFile that contains the frequency with which each word occurs.
The function to use to split a character vector, where each element contains one or more words, into a vector where each element is a word.
The encoding used to read and write files.
If the file provided is an HTML file, xpathSApply is used to extract the content. xPathSelector specifies which content to extract (the default value extracts all text content).
Whether to suppress detailed feedback about the process.
A character vector, the elements of which are to be broken down into words.

detectRareWords return a dataframe (invisibly) if output contains return. Otherwise, NULL is returned (invisibly), but the output is printed and/or written to a file depending on the value of output.textToWords returns a vector of words.

  • detectRareWords
  • textToWords
## Not run: 
# detectRareWords(paste('Dit is een tekst om de',
#                       'werking van de detectRareWords',
#                       'functie te demonstreren.'),
#                 output='show');
# ## End(Not run)
Documentation reproduced from package userfriendlyscience, version 0.5-2, License: GPL (>= 2)

Community examples

Looks like there are no examples yet.