Learn R Programming

quanteda (version 0.8.4-2)

trim: Trim a dfm using threshold-based or random feature selection

Description

Returns a document by feature matrix reduced in size based on document and term frequency, and/or subsampling.

Usage

trim(x, minCount = 1, minDoc = 1, nsample = NULL, verbose = TRUE)

## S3 method for class 'dfm': trim(x, minCount = 1, minDoc = 1, nsample = NULL, verbose = TRUE)

trimdfm(x, ...)

Arguments

x
document-feature matrix of dfm-class
minCount
minimum feature count
minDoc
minimum number of documents in which a feature appears
nsample
how many features to retain (based on random selection)
verbose
print messages
...
only included to allow legacy trimdfm to pass arguments to trim

Value

  • A dfm-class object reduced in features (with the same number of documents)

See Also

selectFeatures

Examples

Run this code
dtm <- dfm(inaugCorpus)
dim(dtm)
dtmReduced <- trim(dtm, minCount = 10, minDoc = 2) # only words occuring >=5 times and in >=2 docs
dim(dtmReduced)
topfeatures(dtmReduced, decreasing = FALSE)
dtmSampled <- trim(dtm, minCount = 20, nsample = 50)  # sample 50 words over 20 count
dtmSampled # 57 x 50 words
topfeatures(dtmSampled)

Run the code above in your browser using DataLab