Take a random sample or documents of the specified size from a corpus or
document-feature matrix, with or without replacement. Works just as
sample
works for the documents and their associated
document-level variables.
corpus_sample(x, size = ndoc(x), replace = FALSE, prob = NULL,
by = NULL, ...)
a corpus object whose documents will be sampled
a positive number, the number of documents to select
Should sampling be with replacement?
A vector of probability weights for obtaining the elements of the vector being sampled.
a grouping variable for sampling. Useful for resampling
sub-document units such as sentences, for instance by specifying by =
"document"
unused
A corpus object with number of documents equal to size
, drawn
from the corpus x
. The returned corpus object will contain all of
the meta-data of the original corpus, and the same document variables for
the documents selected.
# NOT RUN {
# sampling from a corpus
summary(corpus_sample(data_corpus_inaugural, 5))
summary(corpus_sample(data_corpus_inaugural, 10, replace = TRUE))
# sampling sentences within document
doccorpus <- corpus(c(one = "Sentence one. Sentence two. Third sentence.",
two = "First sentence, doc2. Second sentence, doc2."))
sentcorpus <- corpus_reshape(doccorpus, to = "sentences")
texts(sentcorpus)
texts(corpus_sample(sentcorpus, replace = TRUE, by = "document"))
# }
Run the code above in your browser using DataLab