Learn R Programming

tmcn (version 0.2-13)

A Text Mining Toolkit for Chinese

Description

A Text mining toolkit for Chinese, which includes facilities for Chinese string processing, Chinese NLP supporting, encoding detecting and converting. Moreover, it provides some functions to support 'tm' package in Chinese.

Copy Link

Version

Install

install.packages('tmcn')

Monthly Downloads

1,940

Version

0.2-13

License

LGPL

Maintainer

Jian Li

Last Published

August 8th, 2019

Functions in tmcn (0.2-13)

Get the current encoding of the locale.

Indicate whether the encoding of input string is BIG5.

Print the UTF-8 codes of a string.

Extract matched substrings by regular expression.

Mixed case capitalizing.

Pad a string to a specified length with a padding character.

Trim space of a string.

Return Chinese stop words.

Set locale to Simplified Chinese/Traditional Chinese/UK.

National Taiwan University Semantic Dictionary

Convert encoding of Chinese string to UTF-8.

Indicate whether the encoding of input string is GBK.

GBK character set

Convert a Chinese text from simplified to traditional characters and vice versa.

Indicate whether the encoding of input string is UTF-8.

Indicate whether the encoding of input string is GB2312.

Indicate whether the encoding of input string is GB18030.

Dictionary of Chinese stop words

Convert a chinese text to pinyin format.

Create a Chinese term-document matrix or a document-term matrix.

Dictionary of simplified and traditional Chinese

Create a word frequency data.frame.

Extract the left or right substrings in a character vector.

Revert UTF-8 string to Chinese character.