corpus (version 0.10.1)

corpus_frame: Corpus Data Frame

Description

Create or test for corpus objects.

Usage

corpus_frame(..., row.names = NULL, filter = NULL)

as_corpus_frame(x, filter = NULL, ..., row.names = NULL)

is_corpus_frame(x)

Arguments

data frame columns for corpus_frame; further arguments passed to as_corpus_text from as_corpus_frame.

row.names

character vector of row names for the corpus object.

filter

text filter object for the "text" column in the corpus object.

x

object to be coerced or tested.

Value

corpus_frame creates a data frame with a column named "text" of type "corpus_text", and a class attribute set to c("corpus_frame", "data.frame").

as_corpus_frame attempts to coerce its argument to a corpus data frame object, setting the row.names and calling as_corpus_text on the "text" column with the filter and arguments.

is_corpus_frame returns TRUE or FALSE depending on whether its argument is a valid corpus object or not.

Details

These functions create or convert another object to a corpus object. A corpus object is just a data frame with special functions for printing, and a column names "text" of type "corpus_text".

corpus has similar semantics to the data.frame function, except that string columns do not get converted to factors.

as_corpus_frame converts another object to a corpus data frame object. By default, the method converts x to a data frame with a column named "text" of type "corpus_text", and sets the class attribute of the result to c("corpus_frame", "data.frame").

is_corpus_frame tests whether x is a data frame with a column named "text" of type "corpus_text".

as_corpus_frame is generic: you can write methods to handle specific classes of objects.

See Also

corpus-package, print.corpus_frame, corpus_text, read_ndjson.

Examples

Run this code
# NOT RUN {
# convert a data frame:
emoji <- data.frame(text = sapply(0x1f600 + 1:30, intToUtf8),
                    stringsAsFactors = FALSE)
as_corpus_frame(emoji)

# construct directly (no need for stringsAsFactors = FALSE):
corpus_frame(text = sapply(0x1f600 + 1:30, intToUtf8))
    
# convert a character vector:
as_corpus_frame(c(a = "goodnight", b = "moon")) # keeps names
as_corpus_frame(c(a = "goodnight", b = "moon"), row.names = NULL) # drops names
# }

Run the code above in your browser using DataCamp Workspace