Learn R Programming

NUSS (version 0.1.0)

nuss: Mixed N-Grams and Unigram Sequence Segmentation (NUSS) function

Description

nuss returns the data.frame containing hashtag, its segmented version, ids of dictionary words, number of words it have taken to segment the hashtag, total number of points, and computed score.

Usage

nuss(sequences, texts)

Value

The output always will be data.frame with sequences, that were

The output is not in the input order. If needed, use lapply

Arguments

sequences

character vector, sequence to be segmented, (e.g., hashtag) or without it. Case-insensitive.

texts

character vector, these are the texts used to create n-grams and unigram dictionary. Case-insensitive.

Details

This function is an arbitrary combination of ngrams_dictionary, unigram_dictionary, ngrams_segmentation, unigram_sequence_segmentation, created to easily segment short texts based on text corpus.

Examples

Run this code
texts <- c("this is science",
           "science is #fascinatingthing",
           "this is a scientific approach",
           "science is everywhere",
           "the beauty of science")
nuss(c("thisisscience", "scienceisscience"), texts)

Run the code above in your browser using DataLab