quanteda (version 3.3.1)

docnames: Get or set document names

Description

Get or set the document names of a corpus, tokens, or dfm object.

Usage

docnames(x)

docnames(x) <- value

docid(x)

segid(x)

Value

docnames returns a character vector of the document names

docnames <- assigns new values to the document names of an object. docnames can only be character, so any non-character value assigned to be a docname will be coerced to mode character.

docid returns an internal variable denoting the original "docname" from which a document came. If an object has been reshaped (e.g. corpus_reshape() or segmented (e.g. corpus_segment()), docid(x) returns the original docnames but segid(x) does the serial number of those segments within the original document.

Arguments

x

the object with docnames

value

a character vector of the same length as x

See Also

featnames()

Examples

Run this code
# get and set doument names to a corpus
corp <- data_corpus_inaugural
docnames(corp) <- char_tolower(docnames(corp))

# get and set doument names to a tokens
toks <- tokens(data_corpus_inaugural)
docnames(toks) <- char_tolower(docnames(toks))

# get and set doument names to a dfm
dfmat <- dfm(data_corpus_inaugural[1:5])
docnames(dfmat) <- char_tolower(docnames(dfmat))

# reassign the document names of the inaugural speech corpus
docnames(data_corpus_inaugural) <- paste("Speech", 1:ndoc(data_corpus_inaugural), sep="")


corp <- corpus(c(textone = "This is a sentence.  Another sentence.  Yet another.",
                 textwo = "Sentence 1. Sentence 2."))
corpsent <- corp %>%
    corpus_reshape(to = "sentences")
docnames(corpsent)

# docid
docid(corpsent)
docid(tokens(corpsent))
docid(dfm(tokens(corpsent)))

# segid
segid(corpsent)
segid(tokens(corpsent))
segid(dfm(tokens(corpsent)))

Run the code above in your browser using DataLab