Learn R Programming

tmcn (version 0.2-13)

A Text Mining Toolkit for Chinese

Description

A Text mining toolkit for Chinese, which includes facilities for Chinese string processing, Chinese NLP supporting, encoding detecting and converting. Moreover, it provides some functions to support 'tm' package in Chinese.

Copy Link

Version

Install

install.packages('tmcn')

Monthly Downloads

1,940

Version

0.2-13

License

LGPL

Maintainer

Jian Li

Last Published

August 8th, 2019

Functions in tmcn (0.2-13)

getCharset

Get the current encoding of the locale.
isBIG5

Indicate whether the encoding of input string is BIG5.
catUTF8

Print the UTF-8 codes of a string.
strextract

Extract matched substrings by regular expression.
strcap

Mixed case capitalizing.
strpad

Pad a string to a specified length with a padding character.
strstrip

Trim space of a string.
stopwordsCN

Return Chinese stop words.
setchs

Set locale to Simplified Chinese/Traditional Chinese/UK.
NTUSD

National Taiwan University Semantic Dictionary
toUTF8

Convert encoding of Chinese string to UTF-8.
isGBK

Indicate whether the encoding of input string is GBK.
GBK

GBK character set
toTrad

Convert a Chinese text from simplified to traditional characters and vice versa.
isUTF8

Indicate whether the encoding of input string is UTF-8.
isGB2312

Indicate whether the encoding of input string is GB2312.
isGB18030

Indicate whether the encoding of input string is GB18030.
STOPWORDS

Dictionary of Chinese stop words
toPinyin

Convert a chinese text to pinyin format.
createDTM

Create a Chinese term-document matrix or a document-term matrix.
SIMTRA

Dictionary of simplified and traditional Chinese
SPORT

Sport news.
createWordFreq

Create a word frequency data.frame.
left

Extract the left or right substrings in a character vector.
revUTF8

Revert UTF-8 string to Chinese character.