Learn R Programming

corpus (version 0.9.1)

text_sub: Text Subsequences

Description

Extract token subsequences from a set of texts.

Usage

text_sub(x, start = 1L, end = -1L, filter = text_filter(x))

Arguments

x

text vector or corpus object.

start

integer vector giving the starting positions of the subsequences, or a two-column integer matrix giving the starting and ending positions.

end

integer vector giving the ending positions of the subsequences; ignored if start is a two-column matrix.

filter

filter specifying the transformation from text to token sequence.

Value

A text vector with the same length and names as x, with the desired subsequences.

Details

text_sub extracts token subsequences from a set of texts. The start and end arguments specifying the positions of the subsequences within the parent texts, as an inclusive range. Negative indices are interpreted as counting from the end of the text, with -1L referring to the last element.

Text subsequences include both dropped and non-dropped tokens.

See Also

text_tokens, text_length.

Examples

Run this code
# NOT RUN {
    x <- as_text(c("A man, a plan.", "A \"canal\"?", "Panama!"),
                 filter = text_filter(drop_punct = TRUE))

    # entire text
    text_sub(x, 1, -1)

    # first three elements
    text_sub(x, 1, 3)

    # last two elements
    text_sub(x, -2, -1)
# }

Run the code above in your browser using DataLab