Learn R Programming

⚠️There's a newer version (0.2-13) of this package.Take me there.

tmcn (version 0.2-12)

A Text Mining Toolkit for Chinese

Description

A Text mining toolkit for Chinese, which includes facilities for Chinese string processing, Chinese NLP supporting, encoding detecting and converting. Moreover, it provides some functions to support 'tm' package in Chinese.

Copy Link

Version

Install

install.packages('tmcn')

Monthly Downloads

1,224

Version

0.2-12

License

LGPL

Maintainer

Jian Li

Last Published

March 3rd, 2018

Functions in tmcn (0.2-12)

Indicate whether the encoding of input string is GB2312.

Indicate whether the encoding of input string is GBK.

National Taiwan University Semantic Dictionary

Get the current encoding of the locale.

Dictionary of Chinese stop words

GBK character set

Extract the left or right substrings in a character vector.

Convert a Chinese text from simplified to traditional characters and vice versa.

Print the UTF-8 codes of a string.

Revert UTF-8 string to Chinese character.

Indicate whether the encoding of input string is UTF-8.

Create a word frequency data.frame.

Convert encoding of Chinese string to UTF-8.

Pad a string to a specified length with a padding character.

Indicate whether the encoding of input string is BIG5.

Indicate whether the encoding of input string is GB18030.

Dictionary of simplified and traditional Chinese

Extract matched substrings by regular expression.

Mixed case capitalizing.

Create a Chinese term-document matrix or a document-term matrix.

Return Chinese stop words.

Sengment a sentence.

Trim space of a string.

Set locale to Simplified Chinese/Traditional Chinese/UK.

Convert a chinese text to pinyin format.