Create or test for text objects.
as_text(x, names = NULL, filter = NULL, ...)
is_text(x)object to be coerced or tested.
character vector of names for the converted text.
text filter object for the converted text.
further arguments passed to or from other methods.
as_text attempts to coerce its argument to text type and
set its names and text_filter properties; it strips
all other attributes.
is_text returns TRUE or FALSE depending on
whether its argument is of text type or not.
The corpus_text type is a new data type provided by the
corpus package suitable for processing Unicode text. Text
vectors behave like character vectors (and can be converted to them
with the as.character function). They can be created using the
read_ndjson function or by converting another object
using the as_text function.
All text objects have a text_filter property specify
how to transform the text into tokens or segment it into sentences.
The default behavior for as_text is to proceed as follows:
If x is a character vector, then we create
a new text vector from x.
If x is a data frame, then we call as_text
on x$text if a column named "text" exists in
the data frame. If the data frame does not have a column
named "text", then we fail with an error message.
If x is a corpus_text object, then we drop all
attributes and we set the class to "corpus_text".
The default behavior for when none of the above conditions
are true is to call as.character on the object first,
preserving the names, and then and call as_text on
the returned character object.
In all cases, when the names is missing, we set the result
names to names(x) (or rownames(x) for a data frame
argument). When names is not missing, we set the result names
to this value.
Similarly, when filter is missing, we set the result text filter
to text_filter(x). When filter is not missing, we set
the result text filter to this value.
Note that the special handling for the names of the object is different
from the other R conversion functions (as.numeric,
as.character, etc.), which drop the names.
as_text is generic: you can write methods to handle specific
classes of objects.
# NOT RUN {
as_text("hello, world!")
as_text(c(a = "goodnight", b = "moon")) # keeps names
as_text(c(a = "goodnight", b = "moon"), names = NULL) # drops names
is_text("hello") # FALSE, "hello" is character, not text
# }
Run the code above in your browser using DataLab