Learn R Programming

jiebaR (version 0.8.1)

filter_segment: Filter segmentation result This function helps remove some words in the segmentation result.

Description

Filter segmentation result This function helps remove some words in the segmentation result.

Usage

filter_segment(input, filter_words, unit = 50)

Arguments

input
a string vector
filter_words
a string vector of words to be removed.
unit
the length of word unit to use in regular expression, and the default is 50. Long list of a words forms a big regular expressions, it may or may not be accepted: the POSIX standard only requires up to 256 bytes. So we use unit to split the words in units.