textmodel_svmlin: [experimental] Linear SVM classifier for texts

Description

Fit a fast linear SVM classifier for sparse text matrices, using svmlin C++ code written by Vikas Sindhwani and S. Sathiya Keerthi. This method implements the modified finite Newton L2-SVM method (L2-SVM-MFN) method described in Sindhwani and Keerthi (2006). Currently, textmodel_svmlin() only works for two-class problems.

Usage

textmodel_svmlin(
  x,
  y,
  intercept = TRUE,
  lambda = 1,
  cp = 1,
  cn = 1,
  scale = FALSE,
  center = FALSE
)

Value

a fitted model object of class textmodel_svmlin

Arguments

x: the dfm on which the model will be fit. Does not need to contain only the training documents.
y: vector of training labels associated with each document identified in train. (These will be converted to factors if not already factors.)
intercept: logical; if TRUE, add an intercept to the data
lambda: numeric; regularization parameter lambda (default 1)
cp: numeric; Relative cost for "positive" examples (the second factor level)
cn: numeric; Relative cost for "negative" examples (the first factor level)
scale: logical; if TRUE, normalize the feature counts
center: logical; if TRUE, centre the feature counts

Warning

This function is marked experimental since it's not fully working yet in a way that translates into more standard SVM parameters that we understand. Use with caution after reading the Sindhwani and Keerthi (2006) paper.

References

Vikas Sindhwani and S. Sathiya Keerthi (2006). Large Scale Semi-supervised Linear SVMs. Proceedings of ACM SIGIR. August 6–11, 2006, Seattle.

V. Sindhwani and S. Sathiya Keerthi (2006). Newton Methods for Fast Solution of Semi-supervised Linear SVMs. Book Chapter in Large Scale Kernel Machines, MIT Press, 2006.

Examples

Run this code

# use Lenihan for govt class and Bruton for opposition
library("quanteda")
docvars(data_corpus_irishbudget2010, "govtopp") <- c("Govt", "Opp", rep(NA, 12))
dfmat <- dfm(tokens(data_corpus_irishbudget2010))

tmod <- textmodel_svmlin(dfmat, y = dfmat$govtopp)
predict(tmod)

Run the code above in your browser using DataLab