jiebaR (version 0.10.99)

filter_segment: Filter segmentation result

Description

This function helps remove some words in the segmentation result.

Usage

filter_segment(input, filter_words, unit = 50)

Arguments

input

a string vector

filter_words

a string vector of words to be removed.

unit

the length of word unit to use in regular expression, and the default is 50. Long list of a words forms a big regular expressions, it may or may not be accepted: the POSIX standard only requires up to 256 bytes. So we use unit to split the words in units.

Examples

Run this code
# NOT RUN {
filter_segment(c("abc","def"," ","."), c("abc"))
# }

Run the code above in your browser using DataLab