Learn R Programming

corpus (version 0.9.1)

corpus: Corpus Objects

Description

Create or test for corpus objects.

Usage

corpus(..., row.names = NULL, filter = NULL)

as_corpus(x, row.names = NULL, filter = NULL, ...)

is_corpus(x)

Arguments

data frame columns for corpus; further arguments passed to or from other methods for as_corpus.

row.names

character vector of row names for the corpus object.

filter

text filter object for the corpus object.

x

object to be coerced or tested.

Value

corpus creates a data frame with a column named "text" of type "corpus_text", and a class attribute set to c("corpus_frame", "data.frame").

as_corpus attempts to coerce its argument to a corpus object, setting the row.names and text_filter properties.

is_corpus returns TRUE or FALSE depending on whether its argument is a valid corpus object or not.

Details

These functions create or convert another object to a corpus object. A corpus object is just a data frame with special functions for printing, and a column names "text" of type "corpus_text".

corpus has similar semantics to the data.frame function, except that string columns do not get converted to factors.

as_corpus converts another object to a corpus object. By default, the method converts x to a data frame with a column named "text" of type "corpus_text", and sets the class attribute of the result to c("corpus_frame", "data.frame").

is_corpus tests whether x is a data frame with a column named "text" of type "corpus_text".

as_corpus is generic: you can write methods to handle specific classes of objects.

See Also

corpus-package, corpus_frame, corpus_text, read_ndjson.

Examples

Run this code
# NOT RUN {
    # convert a data frame:
    emoji <- data.frame(text = sapply(0x1f600 + 1:30, intToUtf8),
                        stringsAsFactors = FALSE)
    as_corpus(emoji)

    # construct directly (no need for stringsAsFactors = FALSE):
    corpus(text = sapply(0x1f600 + 1:30, intToUtf8))
    
    # convert a character vector:
    as_corpus(c(a = "goodnight", b = "moon")) # keeps names
    as_corpus(c(a = "goodnight", b = "moon"), row.names = NULL) # drops names
# }

Run the code above in your browser using DataLab