quanteda.textmodels (version 0.9.1)

textmodel_svm: Linear SVM classifier for texts

Description

Fit a fast linear SVM classifier for texts, using the LiblineaR package.

Usage

textmodel_svm(x, y, weight = c("uniform", "docfreq", "termfreq"), ...)

Arguments

x

the dfm on which the model will be fit. Does not need to contain only the training documents.

y

vector of training labels associated with each document identified in train. (These will be converted to factors if not already factors.)

weight

weights for different classes for imbalanced training sets, passed to wi in LiblineaR. "uniform" uses default; "docfreq" weights by the number of training examples, and "termfreq" by the relative sizes of the training classes in terms of their total lengths in tokens.

...

additional arguments passed to LiblineaR

References

R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin. (2008) LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research 9: 1871-1874. http://www.csie.ntu.edu.tw/~cjlin/liblinear.

See Also

LiblineaR

Examples

Run this code
# NOT RUN {
# use party leaders for govt and opposition classes
quanteda::docvars(data_corpus_irishbudget2010, "govtopp") <-
    c(rep(NA, 4), "Gov", "Opp", NA, "Opp", NA, NA, NA, NA, NA, NA)
dfmat <- quanteda::dfm(data_corpus_irishbudget2010)
tmod <- textmodel_svm(dfmat, y = quanteda::docvars(dfmat, "govtopp"))
predict(tmod)
predict(tmod, type = "probability")

# multiclass problem - all party leaders
tmod2 <- textmodel_svm(dfmat,
    y = c(rep(NA, 3), "SF", "FF", "FG", NA, "LAB", NA, NA, "Green", rep(NA, 3)))
predict(tmod2)
predict(tmod2, type = "probability")
# }

Run the code above in your browser using DataLab