This is a basic function for MeCab part-of-speech tagger. The function gets a
character vector of any length and runs a loop inside C++ to provide faster
processing.
You can add a user dictionary to user_dic
. It should be compiled by
mecab-dict-index
. You can find an explatation about compiling a user
dictionary in the https://github.com/junhewk/RcppMeCab.
You can also set a system dictionary especially if you are using multiple
dictionaries (for example, using both IPA and Juman dictionary at the same time in Japanese)
in sys_dic
. Using options(mecabSysDic=)
, you can set your
prefered system dictionary to the R terminal.
If you want to get a morpheme only, use join = False
to put tag names on the attribute.
Basically, the function will return a list of character vectors with (morpheme)/(tag) elements.