A function segment Chinese sentence into words.
segmentCN(strwords, package = c("jiebaR", "Rwordseg"), nature = FALSE,
nosymbol = TRUE, useStopDic = FALSE, returnType = c("vector", "tm"))
insertWords(inswords, package = c("jiebaR", "Rwordseg"))A string vector of Chinese sentences in UTF-8.
Use which package, "jiebaR" or "Rwordseg"?
Whether to recognise the nature of the words.
Whether to keep symbols in the sentence.
Whether to use the default stop words.
Default is a string vector but we also can choose 'tm'
to output a single string separated by space so that it can be used by Corpus directly.
A string vector of words will be added into dictionary.
a vector of words (list if input is vecter) which have been segmented or the path of output file.
The function segmentCN is originated from the 'Rwordseg' package.
If 'Rwordseg' was installed successfully (JRE and 'rJava' package
are required), using 'Rwordseg::segmentCN' directly may be the easiest choice.
More detailed can be found in http://jianl.org/cn/R/Rwordseg.html.
In this package the function segmentCN is a wrapper of 'jiebaR',
which can be easily installed from CRAN. This function segmentCN only
provide some basic functionalities of 'jiebaR'. More detailed can be
found in http://qinwenfeng.com/jiebaR.
The function insertWords is used to add new words into dictionary temporarily.
If you want to manage your own dictionary, please select either 'Rwordseg' or
'jiebaR' package for segmentation.