quanteda (version 4.0.1)

docnames: Get or set document names

Description

Get or set the document names of a corpus, tokens, or dfm object.

Usage

docnames(x)

docnames(x) <- value

docid(x)

segid(x)

Value

docnames returns a character vector of the document names

docnames <- assigns new values to the document names of an object. docnames can only be character, so any non-character value assigned to be a docname will be coerced to mode character.

docid returns an internal variable denoting the original "docname" from which a document came. If an object has been reshaped (e.g. corpus_reshape() or segmented (e.g. corpus_segment()), docid(x) returns the original docnames but segid(x) does the serial number of those segments within the original document.

Arguments

x

the object with docnames

value

a character vector of the same length as x

See Also

featnames()

Examples

Run this code
# get and set doument names to a corpus
corp <- data_corpus_inaugural
docnames(corp) <- char_tolower(docnames(corp))

# get and set doument names to a tokens
toks <- tokens(corp)
docnames(toks) <- char_tolower(docnames(toks))

# get and set doument names to a dfm
dfmat <- dfm(tokens(corp))
docnames(dfmat) <- char_tolower(docnames(dfmat))

# reassign the document names of the inaugural speech corpus
corp <- data_corpus_inaugural
docnames(corp) <- paste0("Speech", seq_len(ndoc(corp)))


corp <- corpus(c(textone = "This is a sentence.  Another sentence.  Yet another.",
                 textwo = "Sentence 1. Sentence 2."))
corp_sent <- corp |>
    corpus_reshape(to = "sentences")
docnames(corp_sent)

# docid
docid(corp_sent)
docid(tokens(corp_sent))
docid(dfm(tokens(corp_sent)))

# segid
segid(corp_sent)
segid(tokens(corp_sent))
segid(dfm(tokens(corp_sent)))

Run the code above in your browser using DataLab