Learn R Programming

corpus (version 0.6.0)

text: Text Vectors

Description

Create or test for text objects.

Usage

as_text(x, ...)
    is_text(x)

Arguments

x

object to be coerced or tested.

further arguments passed to or from other methods.

Value

as_text attempts to coerce its argument to text type; it strips all attributes except for names.

is_text returns TRUE or FALSE depending on whether its argument is of text type or not.

Details

The text type is a new data type provided by the corpus package suitable for processing Unicode text. Text vectors behave like character vectors (and can be converted to them with the as.character function). They can be created using the read_ndjson function or by converting another object using the as_text function.

The default behavior for as_text is to proceed as follows:

  1. If x is a character vector, then we create a new text vector from x, preserving names(x) if they exist.

  2. If is_text(x) is TRUE, then we drop all attributes from the object except for its names, and we set the object class to text.

  3. Otherwise, if is.data.frame(x) is TRUE, then we look for a column to convert. First, we look for a column named "text". If none exists, we look for a column of type text. If we find such a column, then we call as_text on the found column and we set the object names to match x's row names. If there are no columns with type text or if there multiple columns of type text, none of which are named "text", then we fail with an error message.

  4. Finally, if x is not a character vector, and if is_text(x) and is.data.frame(x) are both FALSE, then we try to use as.character on the object and then we convert the resulting character vector to text.

This special handling for the names of the object is different from the other R conversion functions (as.numeric, as.character, etc.), which drop the names.

as_text and is_text are generic: you can write methods to handle specific classes of objects.

See Also

read_ndjson.

Examples

Run this code
    as_text("hello, world!")
    as_text(c(a="goodnight", b="moon")) # keeps names

    is_text("hello") # FALSE, "hello" is character, not text

Run the code above in your browser using DataLab