Learn R Programming

phrasemachine (version 1.1.2)

extract_ngram_filter: Extract phrase spans

Description

Takes a sequences of POS tags and a regex and returns spans which match regex.

Usage

extract_ngram_filter(pos_tags, regex, maximum_ngram_length,
  minimum_ngram_length)

Arguments

pos_tags
A character vector of Penn TreeBank or Petrov/Gimpel style tags.
regex
The regular expression (or vector of regular expressions) used to find phrases.
maximum_ngram_length
The maximum length phrases returned.
minimum_ngram_length
The minimum length phrases returned.

Value

A numeric matrix with two columns and rows equal to number of spans matched. First column is span start, second is span end.

Examples

Run this code
pos_tags <- c("VB", "JJ", "NN", "NN")
spans <- extract_ngram_filter(pos_tags,
                              regex = "(A|N)*N(PD*(A|N)*N)*",
                              maximum_ngram_length = 8,
                              minimum_ngram_length = 1)

Run the code above in your browser using DataLab