
Last chance! 50% off unlimited learning
Sale ends in
wsvm
is used to train a subject weighted support vector machine. It can be used to carry out general regression and classification (of nu and epsilon-type), as
well as density-estimation. A formula interface is provided.
# S3 method for formula
wsvm(formula, weight, data = NULL, ..., subset, na.action =
na.omit, scale = TRUE)
# S3 method for default
wsvm(x, y = NULL, weight, scale = TRUE, type = NULL, kernel =
"radial", degree = 3, gamma = if (is.vector(x)) 1 else 1 / ncol(x),
coef0 = 0, cost = 1, nu = 0.5,
class.weights = NULL, cachesize = 100, tolerance = 0.001, epsilon = 0.1,
shrinking = TRUE, cross = 0, probability = FALSE, fitted = TRUE,
..., subset, na.action = na.omit)
An object of class "wsvm"
containing the fitted model, including:
The resulting support vectors (possibly scaled).
The index of the resulting support vectors in the data
matrix. Note that this index refers to the preprocessed data (after
the possible effect of na.omit
and subset
)
The corresponding coefficients times the training labels.
The negative intercept.
In case of a probabilistic regression model, the scale parameter of the hypothesized (zero-mean) laplace distribution estimated by maximum likelihood.
numeric vectors of length k(k-1)/2, k number of classes, containing the parameters of the logistic distributions fitted to the decision values of the binary classifiers (1 / (1 + exp(a x + b))).
a symbolic description of the model to be fit.
an optional data frame containing the variables in the model. By default the variables are taken from the environment which ‘wsvm’ is called from.
a data matrix, a vector, or a sparse 'design matrix' (object of class
Matrix
provided by the Matrix package,
or of class matrix.csr
provided by the SparseM package, or of class
simple_triplet_matrix
provided by the slam
package). Or a kernel matrix of class kernelMatrix
by the kernlab package.
a response vector with one label for each row/component of
x
. Can be either a factor (for classification tasks)
or a numeric vector (for regression).
the weight of each subject. It should be in the same length of y
.
A logical vector indicating the variables to be
scaled. If scale
is of length 1, the value is recycled as
many times as needed.
By default, data are scaled internally (both x
and y
variables) to zero mean and unit variance. The center and scale
values are returned and used for later predictions. If x is a design matrix which contains dummy variables, please make these variable NOT scaled.
wsvm
can be used as a classification
machine, as a regression machine, or for novelty detection.
Depending of whether y
is
a factor or not, the default setting for type
is C-classification
or eps-regression
, respectively, but may be overwritten by setting an explicit value.
Valid options are:
C-classification
nu-classification
one-classification
(for novelty detection)
eps-regression
nu-regression
the kernel used in training and predicting. You
might consider changing some of the following parameters, depending
on the kernel type.
x is a precomputed kernel matrix that contains NO missing values. scale
will not work. Cannot use subset
and na.action
with this kernel.
parameter needed for kernel of type polynomial
(default: 3)
parameter needed for all kernels except linear
(default: 1/(data dimension))
parameter needed for kernels of type polynomial
and sigmoid
(default: 0)
cost of constraints violation (default: 1)---it is the ‘C’-constant of the regularization term in the Lagrange formulation.
parameter needed for nu-classification
,
nu-regression
, and one-classification
a named vector of weights for the different
classes, used for asymmetric class sizes. Not all factor levels have
to be supplied (default weight: 1). All components have to be
named. Specifying "inverse"
will choose the weights inversely
proportional to the class distribution.
cache memory in MB (default 100)
tolerance of termination criterion (default: 0.001)
epsilon in the insensitive-loss function (default: 0.1)
option whether to use the shrinking-heuristics
(default: TRUE
)
if a integer value k>0 is specified, a k-fold cross
validation on the training data is performed to assess the quality
of the model: the accuracy rate for classification and the Mean
Squared Error for regression. Note the result is not weighted. For weighted results,
use tune_wsvm
fucntion.
logical indicating whether the fitted values should be computed
and included in the model or not (default: TRUE
)
logical indicating whether the model should allow for probability predictions.
additional parameters for the low level fitting function
wsvm.default
An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)
A function to specify the action to be taken if NA
s are
found. The default action is na.omit
, which leads to rejection of cases
with missing values on any required variable. An alternative
is na.fail
, which causes an error if NA
cases
are found. (NOTE: If given, this argument must be named.)
David Meyer (based on C/C++-code by Chih-Chung Chang and Chih-Jen Lin)
Modified by Tianchen Xu tx2155@columbia.edu
The original libsvm
does not support subject/instance weighted svm. From the 'LIBSVM Tools' https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances, we are able to use a modified version of libsvm
to support subject weights.
For multiclass-classification with k levels, k>2, libsvm
uses the
‘one-against-one’-approach, in which k(k-1)/2 binary classifiers are
trained; the appropriate class is found by a voting scheme.
libsvm
internally uses a sparse data representation, which is
also high-level supported by the package SparseM.
If the predictor variables include factors, the formula interface must be used to get a correct model matrix or make x a design matrix.
When using the formula interface and na.action
is na.omit
, we delete any subjects with missing values on x, y (if exists) or weight in the training and predicting procedure (when fitted = TRUE
). When using the x, y interface and na.action
is na.omit
, we delete any subjects with missing values on x, y (if exists) or weight in the training procedure, and retain the subjects with missing values only on weight in the predicting procedure (when fitted = TRUE
).
plot.wsvm
allows a simple graphical
visualization of classification models.
The probability model for classification fits a logistic distribution using maximum likelihood to the decision values of all binary classifiers, and computes the a-posteriori class probabilities for the multi-class problem using quadratic optimization. The probabilistic regression model assumes (zero-mean) laplace-distributed errors for the predictions, and estimates the scale parameter using maximum likelihood.
For linear kernel, the coefficients of the regression/decision hyperplane
can be extracted using the coef
method (see examples).
Chang, Chih-Chung and Lin, Chih-Jen:
LIBSVM: a library for Support Vector Machines
https://www.csie.ntu.edu.tw/~cjlin/libsvm/
Ming-Wei Chang, Hsuan-Tien Lin, Ming-Hen Tsai, Chia-Hua Ho and Hsiang-Fu Yu
Weights for data instances
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances
Exact formulations of models, algorithms, etc. can be found in the
document:
Chang, Chih-Chung and Lin, Chih-Jen:
LIBSVM: a library for Support Vector Machines
https://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.ps.gz
More implementation details and speed benchmarks can be found on:
Rong-En Fan and Pai-Hsune Chen and Chih-Jen Lin:
Working Set Selection Using the Second Order Information for Training SVM
https://www.csie.ntu.edu.tw/~cjlin/papers/quadworkset.pdf
predict.wsvm
,
plot.wsvm
,
tune_wsvm
,
matrix.csr
(in package SparseM)
## check what is loaded
dllpath <- getLoadedDLLs()
getDLLRegisteredRoutines(dllpath$WeightSVM[[2]])
## load dataset
data(iris)
## classification mode
# default with factor response:
model1 <- wsvm(Species ~ ., weight = rep(1,150), data = iris) # same weights
model2 <- wsvm(x = iris[,1:4], y = iris[,5],
weight = c(rep(0.08, 50),rep(1,100))) # less weights to setosa
# alternatively the traditional interface:
x <- subset(iris, select = -Species)
y <- iris$Species
model3 <- wsvm(x, y, weight = rep(10,150)) # similar to model 1,
# but larger weights for all subjects
# These models provide error/warning info
try(wsvm(x, y)) # no weight
try(wsvm(x, y, weight = rep(10,100))) # wrong length
try(wsvm(x, y, weight = c(Inf, rep(1,149)))) # contains inf weight
print(model1)
summary(model1)
# test with train data
pred <- predict(model1, iris[,1:4])
# (same as:)
pred <- fitted(model1)
# Check accuracy:
table(pred, y) # model 1, equal weights
# compute decision values and probabilities:
pred <- predict(model1, x, decision.values = TRUE)
attr(pred, "decision.values")[1:4,]
# visualize (classes by color, SV by crosses):
plot(cmdscale(dist(iris[,-5])),
col = as.integer(iris[,5]),
pch = c("o","+")[1:150 %in% model1$index + 1]) # model 1
plot(cmdscale(dist(iris[,-5])),
col = as.integer(iris[,5]),
pch = c("o","+")[1:150 %in% model2$index + 1])
# In model 2, less support vectors are based on setosa
## try regression mode on two dimensions
# create data
x <- seq(0.1, 5, by = 0.05)
y <- log(x) + rnorm(x, sd = 0.2)
# estimate model and predict input values
model1 <- wsvm(x, y, weight = rep(1,99))
model2 <- wsvm(x, y, weight = seq(99,1,length.out = 99)) # decreasing weights
# library(kernlab)
# model3 <- wsvm(kernlab::kernelMatrix(kernlab::rbfdot(sigma = 1), x), y,
# weight = rep(1,99), kernel = 'precomputed') # try user defined kernel
# visualize
plot(x, y)
lines(x, log(x), col = 2)
points(x, fitted(model1), col = 4)
points(x, fitted(model2), col = 3) # better fit for the first few points
# points(x, fitted(model3), col = 5) # similar to model 1 with user defined kernel
## density-estimation
# create 2-dim. normal with rho=0:
X <- data.frame(a = rnorm(1000), b = rnorm(1000))
attach(X)
# formula interface:
model <- wsvm(~ a + b, gamma = 0.1, weight = c(seq(5000,1,length.out = 500),1:500))
# test:
newdata <- data.frame(a = c(0, 4), b = c(0, 4))
# visualize:
plot(X, col = 1:1000 %in% model$index + 1, xlim = c(-5,5), ylim=c(-5,5))
points(newdata, pch = "+", col = 2, cex = 5)
## class weights:
i2 <- iris
levels(i2$Species)[3] <- "versicolor"
summary(i2$Species)
wts <- 100 / table(i2$Species)
wts
m <- wsvm(Species ~ ., data = i2, class.weights = wts, weight=rep(1,150))
## extract coefficients for linear kernel
# a. regression
x <- 1:100
y <- x + rnorm(100)
m <- wsvm(y ~ x, scale = FALSE, kernel = "linear", weight = rep(1,100))
coef(m)
plot(y ~ x)
abline(m, col = "red")
# b. classification
# transform iris data to binary problem, and scale data
setosa <- as.factor(iris$Species == "setosa")
iris2 = scale(iris[,-5])
# fit binary C-classification model
model1 <- wsvm(setosa ~ Petal.Width + Petal.Length,
data = iris2, kernel = "linear", weight = rep(1,150))
model2 <- wsvm(setosa ~ Petal.Width + Petal.Length,
data = iris2, kernel = "linear",
weight = c(rep(0.08, 50),rep(1,100))) # less weights to setosa
# plot data and separating hyperplane
plot(Petal.Length ~ Petal.Width, data = iris2, col = setosa)
(cf <- coef(model1))
abline(-cf[1]/cf[3], -cf[2]/cf[3], col = "red")
(cf2 <- coef(model2))
abline(-cf2[1]/cf2[3], -cf2[2]/cf2[3], col = "red", lty = 2)
# plot margin and mark support vectors
abline(-(cf[1] + 1)/cf[3], -cf[2]/cf[3], col = "blue")
abline(-(cf[1] - 1)/cf[3], -cf[2]/cf[3], col = "blue")
points(model1$SV, pch = 5, cex = 2)
abline(-(cf2[1] + 1)/cf2[3], -cf2[2]/cf2[3], col = "blue", lty = 2)
abline(-(cf2[1] - 1)/cf2[3], -cf2[2]/cf2[3], col = "blue", lty = 2)
points(model2$SV, pch = 6, cex = 2)
Run the code above in your browser using DataLab