Learn R Programming

quanteda (version 0.9.2-0)

sample.corpus: Randomly sample documents or features

Description

Takes a random sample or documents or features of the specified size from a corpus or document-feature matrix, with or without replacement

Usage

## S3 method for class 'corpus':
sample(x, size = ndoc(x), replace = FALSE, prob = NULL,
  ...)

sample(x, size, replace = FALSE, prob = NULL, ...)

## S3 method for class 'default': sample(x, size, replace = FALSE, prob = NULL, ...)

## S3 method for class 'dfm': sample(x, size = ndoc(x), replace = FALSE, prob = NULL, what = c("documents", "features"), ...)

Arguments

x
a corpus or dfm object whose documents or features will be sampled
size
a positive number, the number of documents to select
replace
Should sampling be with replacement?
prob
A vector of probability weights for obtaining the elements of the vector being sampled.
...
unused sample, which is not defined as a generic method in the base package.
what
dimension (of a dfm) to sample: can be documents or features

Value

  • A corpus object with number of documents equal to size, drawn from the corpus x. The returned corpus object will contain all of the meta-data of the original corpus, and the same document variables for the documents selected.

    A dfm object with number of documents equal to size, drawn from the corpus x. The returned corpus object will contain all of the meta-data of the original corpus, and the same document variables for the documents selected.

See Also

sample

sample

Examples

Run this code
# sampling from a corpus
summary(sample(inaugCorpus, 5)) 
summary(sample(inaugCorpus, 10, replace=TRUE))
# sampling from a dfm
myDfm <- dfm(inaugTexts[1:10], verbose = FALSE)
sample(myDfm)[, 1:10]
sample(myDfm, replace = TRUE)[, 1:10]
sample(myDfm, what = "features")[1:10, ]

Run the code above in your browser using DataLab