Learn R Programming

jiebaR (version 0.11)

jiebaR: A package for Chinese text segmentation

Description

This is a package for Chinese text segmentation, keyword extraction and speech tagging with Rcpp and cppjieba.

Arguments

Author

Qin Wenfeng <http://qinwenfeng.com>

Details

You can use custom dictionary. JiebaR can also identify new words, but adding new words will ensure higher accuracy.

References

CppJieba https://github.com/aszxqw/cppjieba;

See Also

Examples

Run this code
### Note: Can not display Chinese characters here.
if (FALSE) {
words = "hello world"
engine1 = worker()
segment(words, engine1)

# "./temp.txt" is a file path

segment("./temp.txt", engine1)

engine2 = worker("hmm")
segment("./temp.txt", engine2)

engine2$write = T
segment("./temp.txt", engine2)

engine3 = worker(type = "mix", dict = "dict_path",symbol = T)
segment("./temp.txt", engine3)
 }
 
if (FALSE) {
### Keyword Extraction
engine = worker("keywords", topn = 1)
keywords(words, engine)

### Speech Tagging 
tagger = worker("tag")
tagging(words, tagger)

### Simhash
simhasher = worker("simhash", topn = 1)
simhash(words, simhasher)
distance("hello world" , "hello world!" , simhasher)

show_dictpath()
}

Run the code above in your browser using DataLab