This function fits a modification of MI-SVM to ordinal outcome data based on the research method proposed by Kent and Yu.
# S3 method for default
omisvm(
x,
y,
bags,
cost = 1,
h = 1,
s = Inf,
method = c("qp-heuristic"),
weights = TRUE,
control = list(kernel = "linear", sigma = if (is.vector(x)) 1 else 1/ncol(x), max_step
= 500, type = "C-classification", scale = TRUE, verbose = FALSE, time_limit = 60),
...
)# S3 method for formula
omisvm(formula, data, ...)
# S3 method for mi_df
omisvm(x, ...)
An object of class omisvm.
The object contains at least the
following components:
*_fit
: A fit object depending on the method
parameter. If method = 'qp-heuristic'
this will be gurobi_fit
from a model optimization.
call_type
: A character indicating which method omisvm()
was called
with.
features
: The names of features used in training.
levels
: The levels of y
that are recorded for future prediction.
cost
: The cost parameter from function inputs.
weights
: The calculated weights on the cost
parameter.
repr_inst
: The instances from positive bags that are selected to be
most representative of the positive instances.
n_step
: If method == 'qp-heuristic'
, the total steps used in the
heuristic algorithm.
x_scale
: If scale = TRUE
, the scaling parameters for new predictions.
A data.frame, matrix, or similar object of covariates, where each
row represents an instance. If a mi_df
object is passed, y, bags
are
automatically extracted, and all other columns will be used as predictors.
A numeric, character, or factor vector of bag labels for each
instance. Must satisfy length(y) == nrow(x)
. Suggest that one of the
levels is 1, '1', or TRUE, which becomes the positive class; otherwise, a
positive class is chosen and a message will be supplied.
A vector specifying which instance belongs to each bag. Can be a string, numeric, of factor.
The cost parameter in SVM. If method = 'heuristic'
, this will
be fed to kernlab::ksvm()
, otherwise it is similarly in internal
functions.
A scalar that controls the trade-off between maximizing the margin and minimizing distance between hyperplanes.
An integer for how many replication points to add to the dataset. If
k
represents the number of labels in y, must have 1 <= s <= k-1
. The
default, Inf
, uses the maximum number of replication points, k-1
.
The algorithm to use in fitting (default 'heuristic'
). When
method = 'heuristic'
, which employs an algorithm similar to Andrews et
al. (2003). When method = 'mip'
, the novel MIP method will be used. When
method = 'qp-heuristic
, the heuristic algorithm is computed using the
dual SVM. See details.
named vector, or TRUE
, to control the weight of the cost
parameter for each possible y value. Weights multiply against the cost
vector. If TRUE
, weights are calculated based on inverse counts of
instances with given label, where we only count one positive instance per
bag. Otherwise, names must match the levels of y
.
list of additional parameters passed to the method that control computation with the following components:
kernel
either a character the describes the kernel ('linear' or
'radial') or a kernel matrix at the instance level.
sigma
argument needed for radial basis kernel.
nystrom_args
a list of parameters to pass to kfm_nystrom()
. This is
used when method = 'mip'
and kernel = 'radial'
to generate a Nystrom
approximation of the kernel features.
max_step
argument used when method = 'heuristic'
. Maximum steps of
iteration for the heuristic algorithm.
type
: argument used when method = 'heuristic'
. The type
argument is
passed to e1071::svm()
.
scale
argument used for all methods. A logical for whether to rescale
the input before fitting.
verbose
argument used when method = 'mip'
. Whether to message output
to the console.
time_limit
argument used when method = 'mip'
. FALSE
, or a time
limit (in seconds) passed to gurobi()
parameters. If FALSE
, no time
limit is given.
start
argument used when method = 'mip'
. If TRUE
, the mip program
will be warm_started with the solution from method = 'qp-heuristic'
to
potentially improve speed.
Arguments passed to or from other methods.
a formula with specification mi(y, bags) ~ x
which uses the
mi
function to create the bag-instance structure. This argument is an
alternative to the x, y, bags
arguments, but requires the data
argument. See examples.
If formula
is provided, a data.frame or similar from which
formula elements will be extracted
omisvm(default)
: Method for data.frame-like objects
omisvm(formula)
: Method for passing formula
omisvm(mi_df)
: Method for mi_df
objects, automatically handling bag
names, labels, and all covariates.
Sean Kent
Currently, the only method available is a heuristic algorithm in linear SVM space. Additional methods should be available shortly.
predict.omisvm()
for prediction on new data.
if (require(gurobi)) {
data("ordmvnorm")
x <- ordmvnorm[, 3:7]
y <- ordmvnorm$bag_label
bags <- ordmvnorm$bag_name
mdl1 <- omisvm(x, y, bags, weights = NULL)
predict(mdl1, x, new_bags = bags)
}
Run the code above in your browser using DataLab