Learn R Programming

lexicon (version 0.7.4)

hash_lemmas: Lemmatization List

Description

A dataset based on Mechura's (2016) English lemmatization list. This data set can be useful for join style lemma replacement of inflected token forms to their root lemmas. While this is not a true morphological analysis this style of lemma replacement is fast and typically still robust.

Usage

data(hash_lemmas)

Arguments

Format

A data frame with 41,532 rows and 2 variables

Details

  • token. An inflected token with affixes

  • lemma. A base form

References

Mechura, M. B. (2016). Lemmatization list: English (en) [Data file]. Retrieved from http://www.lexiconista.com