Learn R Programming

⚠️There's a newer version (0.2-13) of this package.Take me there.

tmcn (version 0.2-12)

A Text Mining Toolkit for Chinese

Description

A Text mining toolkit for Chinese, which includes facilities for Chinese string processing, Chinese NLP supporting, encoding detecting and converting. Moreover, it provides some functions to support 'tm' package in Chinese.

Copy Link

Version

Install

install.packages('tmcn')

Monthly Downloads

1,224

Version

0.2-12

License

LGPL

Maintainer

Jian Li

Last Published

March 3rd, 2018

Functions in tmcn (0.2-12)

isGB2312

Indicate whether the encoding of input string is GB2312.
isGBK

Indicate whether the encoding of input string is GBK.
NTUSD

National Taiwan University Semantic Dictionary
getCharset

Get the current encoding of the locale.
STOPWORDS

Dictionary of Chinese stop words
GBK

GBK character set
left

Extract the left or right substrings in a character vector.
toTrad

Convert a Chinese text from simplified to traditional characters and vice versa.
catUTF8

Print the UTF-8 codes of a string.
revUTF8

Revert UTF-8 string to Chinese character.
isUTF8

Indicate whether the encoding of input string is UTF-8.
SPORT

Sport news.
createWordFreq

Create a word frequency data.frame.
toUTF8

Convert encoding of Chinese string to UTF-8.
strpad

Pad a string to a specified length with a padding character.
isBIG5

Indicate whether the encoding of input string is BIG5.
isGB18030

Indicate whether the encoding of input string is GB18030.
SIMTRA

Dictionary of simplified and traditional Chinese
strextract

Extract matched substrings by regular expression.
strcap

Mixed case capitalizing.
createDTM

Create a Chinese term-document matrix or a document-term matrix.
stopwordsCN

Return Chinese stop words.
segmentCN

Sengment a sentence.
strstrip

Trim space of a string.
setchs

Set locale to Simplified Chinese/Traditional Chinese/UK.
toPinyin

Convert a chinese text to pinyin format.