For a one-classification svm (OCSVM) model, the PMML has two OutputField elements:
anomalyScore
and one of anomaly
or outlier
.
The OutputField anomalyScore
is the signed distance to the separating boundary;
anomalyScore
corresponds to the decision.values
attribute of the output of the
svm predict function in R.
The second OutputField depends the value of detect_anomaly
. By default, detect_anomaly
is TRUE,
which results in the second OutputField being anomaly
.
The anomaly
OutputField is TRUE when an anomaly is detected.
This field conforms to the DMG definition of an anomaly detection model. This value is the
opposite of the prediction by the e1071::svm object in R.
Setting detect_anomaly
to FALSE results in the second field instead being inlier
.
This OutputField is TRUE when an inlier is
detected, and conforms to the e1071 definition of one-class SVMs. This field is FALSE when
an anomaly is detected; that is, the R svm model predicts whether an observation belongs to the
class. When comparing the predictions from R and PMML, this field should be used, since it
will match R's output.
For example, say that for an an observation, the R OCSVM model predicts a positive
decision value of 0.4 and label of TRUE. According to the R object, this means that the
observation is an inlier. By default, the PMML export of this model will give the following for the
same input: anomalyScore = 0.4, anomaly = "false"
. According to the PMML, the observation is not an anomaly.
If the same R object is instead exported with detect_anomaly = FALSE
,
the PMML will then give: anomalyScore = 0.4, inlier = "true"
, and this result agrees with R.
Note that there is no sign flip for anomalyScore
between R and PMML for OCSVM models.
To export a OCSVM model, an additional argument, dataset
, is required by the function.
This argument expects a dataframe with data that was used to train the model. This is
necessary because for one-class svm, the R svm object does not contain information about
the data types of the features used to train the model. The exporter does not yet support
the formula interface for one-classification models, so the default S3 method must be used
to train the SVM. The data used to train the one-class SVM must be numeric and not of
integer class.