sofia
is used to fit classification and regression models provided by D. Sculley's sofia-ml.
sofia(x, ...)
"sofia"(x, data, random_seed = floor(runif(1, 1, 65535)), lambda = 0.1, iterations = 1e+05, learner_type = c("pegasos", "sgd-svm", "passive-aggressive", "margin-perceptron", "romma", "logreg-pegasos"), eta_type = c("pegasos", "basic", "constant"), loop_type = c("stochastic", "balanced-stochastic", "rank", "roc", "query-norm-rank", "combined-ranking", "combined-roc"), rank_step_probability = 0.5, passive_aggressive_c = 1e+07, passive_aggressive_lambda = 0, perceptron_margin_size = 1, training_objective = FALSE, hash_mask_bits = 0, verbose = FALSE, reserve = 0, ...)
"sofia"(x, random_seed = floor(runif(1, 1, 65535)), lambda = 0.1, iterations = 1e+05, learner_type = c("pegasos", "sgd-svm", "passive-aggressive", "margin-perceptron", "romma", "logreg-pegasos"), eta_type = c("pegasos", "basic", "constant"), loop_type = c("stochastic", "balanced-stochastic", "rank", "roc", "query-norm-rank", "combined-ranking", "combined-roc"), rank_step_probability = 0.5, passive_aggressive_c = 1e+07, passive_aggressive_lambda = 0, perceptron_margin_size = 1, training_objective = FALSE,no_bias_term = FALSE, dimensionality=150000, hash_mask_bits = 0, verbose = FALSE, buffer_mb = 40, ...)
formula
object or a character with a path to a file"pegasos"
(default), "sgd-svm"
, "passive-aggressive"
, "margin-perceptron"
, "romma"
, "logreg-pegasos"
"pegasos"
(default), "basic"
, "constant"
"stochastic"
- Perform normal stochastic sampling for stochastic gradient descent, for training binary classifiers. On each iteration, pick a new example uniformly at random from the data set.
"balanced-stochastic"
- Perform a balanced sampling from positives and negatives in data set. For each iteration, samples one positive example uniformly at random from the set of all positives, and samples one negative example uniformly at random from the set of all negatives. This can be useful for training binary classifiers with a minority-class distribution.
"rank"
- Perform indexed sampling of candidate pairs for pairwise learning to rank. Useful when there are examples from several different qid groups.
"roc"
- Perform indexed sampling to optimize ROC Area.
"query-norm-rank"
- Perform sampling of candidate pairs, giving equal weight to each qid group regardless of its size. Currently this is implemented with rejection sampling rather than indexed sampling, so this may run more slowly.
"combined-ranking"
- Performs CRR algorithm for combined regression and ranking. Alternates between pairwise rank-based steps and standard stochastic gradient steps on single examples. Relies on "rank_step_probability"
to balance between these two kinds of updates.
"combined-roc"
- Performs CRR algorithm for combined regression and ROC area optimization. Alternates between pairwise roc-optimization-based steps and standard stochastic gradient steps on single examples. Relies on "rank_step_probability"
to balance between these two kinds of updates. This can be faster than the combined-ranking option when there are exactly two classes.
TRUE
, computes the value of the standard SVM objective function on training data, after training.hash_mask_bits
. default value of 0 shows that hash cross products are not used.D. Sculley. Combined Regression and Ranking. Proceedings of the 16th Annual SIGKDD Conference on Knowledge Discover and Data Mining, 2010.
D. Sculley. Web-Scale K-Means Clustering. Proceedings of the 19th international conference on World Wide Web, 2010.
D. Sculley. Large Scale Learning to Rank. NIPS Workshop on Advances in Ranking, 2009. Presents the indexed sampling methods used learning to rank, including the rank and roc loops.
data(irismod)
model.logreg <- sofia(Is.Virginica ~ ., data=irismod, learner_type="logreg-pegasos")
Run the code above in your browser using DataLab