nuss

<code>nuss</code> returns the data.frame containing
hashtag, its segmented version, ids of dictionary words,
number of words it have taken to segment the hashtag,
total number of points, and computed score.

Segmentation of short text sequences - like hashtags - into the
separated words sequence, done with the use of dictionary, which may be
built on custom corpus of texts. Unigram dictionary is used to find most
probable sequence, and n-grams approach is used to determine possible
segmentation given the text corpus.

Oskar Kosch

NUSS

Mixed N-Grams and Unigram Sequence Segmentation

nuss function

<dl><dt>sequences</dt>
<dd>character vector, sequence to be segmented,
(e.g., hashtag) or without it. Case-insensitive.</dd>
<dt>texts</dt>
<dd>character vector, these are the texts used to create n-grams
and unigram dictionary. Case-insensitive.</dd></dl>

Arguments

Mixed N-Grams and Unigram Sequence Segmentation (NUSS) function — nuss

<dl>

<dt>sequences</dt>
<dd>character vector, sequence to be segmented,
(e.g., hashtag) or without it. Case-insensitive.</dd>


<dt>texts</dt>
<dd>character vector, these are the texts used to create n-grams
and unigram dictionary. Case-insensitive.</dd>

</dl>

nuss: Mixed N-Grams and Unigram Sequence Segmentation (NUSS) function

Description

Usage

Value

Arguments

Details

Examples