Learn R Programming

crfsuite (version 0.3.1)

txt_feature: Extract basic text features which are useful for entity recognition

Description

Extract basic text features which are useful for entity recognition

Usage

txt_feature(
  x,
  type = c("is_capitalised", "is_url", "is_email", "is_number", "prefix", "suffix"),
  n = 4
)

Arguments

x

a character vector

type

a character string, which can be one of 'is_capitalised', 'is_url', 'is_email', 'is_number', 'prefix', 'suffix'

n

for type 'prefix' or 'suffix', the number of characters of the prefix/suffix

Value

For type 'is_capitalised', 'is_url', 'is_email', 'is_number': a logical vector of the same length as x, indicating if x is capitalised, a url, an email or a number For type 'prefix', 'suffix': a character vector of the same length as x, containing the prefix or suffix n number of characters of x

Examples

Run this code
# NOT RUN {
txt_feature("Red Devils", type = "is_capitalised")
txt_feature("red devils", type = "is_capitalised")
txt_feature("http://www.bnosac.be", type = "is_url")
txt_feature("info@google.com", type = "is_email")
txt_feature("hi there", type = "is_email")
txt_feature("1230000", type = "is_number")
txt_feature("123.15", type = "is_number")
txt_feature("123,15", type = "is_number")
txt_feature("123abc", type = "is_number")
txt_feature("abcdefghijklmnopqrstuvwxyz", type = "prefix", n = 3)
txt_feature("abcdefghijklmnopqrstuvwxyz", type = "suffix", n = 3)
# }

Run the code above in your browser using DataLab