Learn R Programming

quanteda (version 0.8.0-3)

trim: Trim a dfm using threshold-based or random feature selection

Description

Returns a document by feature matrix reduced in size based on document and term frequency, and/or subsampling.

Usage

trim(x, minCount = 1, minDoc = 1, nsample = NULL, verbose = TRUE)

## S3 method for class 'dfm': trim(x, minCount = 1, minDoc = 1, nsample = NULL, verbose = TRUE)

trimdfm(x, ...)

Arguments

x
document-feature matrix of dfm-class
minCount
minimum feature count
minDoc
minimum number of documents in which a feature appears
nsample
how many features to retain (based on random selection)
verbose
print messages
...
only included to allow legacy trimdfm to pass arguments to trim

Value

Examples

Run this code
dtm <- dfm(inaugCorpus)
dim(dtm)
dtmReduced <- trim(dtm, minCount=10, minDoc=2) # only words occuring >=5 times and in >=2 docs
dim(dtmReduced)
topfeatures(dtmReduced, decreasing=FALSE)
dtmSampled <- trim(dtm, minCount=20, nsample=50)  # sample 50 words over 20 count
dtmSampled # 57 x 50 words
topfeatures(dtmSampled)

Run the code above in your browser using DataLab