NLP (version 0.2-1)

String: String objects


Creation and manipulation of string objects.





a character vector with the appropriate encoding information for String(); an arbitrary R object otherwise.


For String() and as.String(), a string object (of class "String").

For is.String(), a logical.


String objects provide character strings encoded in UTF-8 with class "String", which currently has a useful [ subscript method: with indices i and j of length one, this gives a string object with the substring starting at the position given by i and ending at the position given by j; subscripting with a single index which is an object inheriting from class "Span" or a list of such objects returns a character vector of substrings with the respective spans, or a list thereof.

Additional methods may be added in the future.

String() creates a string object from a given character vector, taking the first element of the vector and converting it to UTF-8 encoding.

as.String() is a generic function to coerce to a string object. The default method calls String() on the result of converting to character and concatenating into a single string with the elements separated by newlines.

is.String() tests whether an object inherits from class "String".


## A simple text.
s <- String("  First sentence.  Second sentence.  ")
##           ****5****0****5****0****5****0****5**

## Basic sentence and word token annotation for the text.
a <- c(Annotation(1 : 2,
        "sentence", 2L),
                  c( 3L, 20L),
                  c(17L, 35L)),
       Annotation(3 : 6,
        "word", 4L),
                  c( 3L,  9L, 20L, 27L),
                  c( 7L, 16L, 25L, 34L)))

## All word tokens (by subscripting with an annotation object):
s[a[a$type == "word"]]
## Word tokens according to sentence (by subscripting with a list of
## annotation objects):
s[annotations_in_spans(a[a$type == "word"], a[a$type == "sentence"])]
# }