Learn R Programming

R Semi-Supervised Learning package

This R package provides implementations of several semi-supervised learning methods, in particular, our own work involving constraint based semi-supervised learning.

To cite the package, use either of these two references:

  • Krijthe, J. H. (2016). RSSL: R package for Semi-supervised Learning. In B. Kerautret, M. Colom, & P. Monasse (Eds.), Reproducible Research in Pattern Recognition. RRPR 2016. Lecture Notes in Computer Science, vol 10214. (pp. 104–115). Springer International Publishing. https://doi.org/10.1007/978-3-319-56414-2_8. arxiv: https://arxiv.org/abs/1612.07993
  • Krijthe, J.H. & Loog, M. (2015). Implicitly Constrained Semi-Supervised Least Squares Classification. In E. Fromont, T. de Bie, & M. van Leeuwen, eds. 14th International Symposium on Advances in Intelligent Data Analysis XIV (Lecture Notes in Computer Science Volume 9385). Saint Etienne. France, pp. 158-169.

Installation Instructions

This package available on CRAN. The easiest way to install the package is to use:

install.packages("RSSL")

To install the latest version of the package using the devtools package:

library(devtools)
install_github("jkrijthe/RSSL")

Usage

After installation, load the package as usual:

library(RSSL)

The following code generates a simple dataset, trains a supervised and two semi-supervised classifiers and evaluates their performance:

library(dplyr,warn.conflicts = FALSE)
library(ggplot2,warn.conflicts = FALSE)

set.seed(2)
df <- generate2ClassGaussian(200, d=2, var = 0.2, expected=TRUE)

# Randomly remove labels
df <- df %>% add_missinglabels_mar(Class~.,prob=0.98) 

# Train classifier
g_nm <- NearestMeanClassifier(Class~.,df,prior=matrix(0.5,2))
g_self <- SelfLearning(Class~.,df,
                       method=NearestMeanClassifier,
                       prior=matrix(0.5,2))

# Plot dataset
df %>% 
  ggplot(aes(x=X1,y=X2,color=Class,size=Class)) +
  geom_point() +
  coord_equal() +
  scale_size_manual(values=c("-1"=3,"1"=3), na.value=1) +
  geom_linearclassifier("Supervised"=g_nm,
                  "Semi-supervised"=g_self)


# Evaluate performance: Squared Loss & Error Rate
mean(loss(g_nm,df))
mean(loss(g_self,df))


mean(predict(g_nm,df)!=df$Class)
mean(predict(g_self,df)!=df$Class)

Acknowledgement

Work on this package was supported by Project 23 of the Dutch national program COMMIT.

Copy Link

Version

Install

install.packages('RSSL')

Monthly Downloads

262

Version

0.9.7

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Jesse Krijthe

Last Published

December 7th, 2023

Functions in RSSL (0.9.7)

LinearSVM-class

LinearSVM Class
LeastSquaresClassifier

Least Squares Classifier
LinearTSVM

Linear CCCP Transductive SVM classifier
LinearSVM

Linear SVM Classifier
LaplacianSVM

Laplacian SVM classifier
LearningCurveSSL

Compute Semi-Supervised Learning Curve
KernelLeastSquaresClassifier

Kernelized Least Squares Classifier
LinearDiscriminantClassifier

Linear Discriminant Classifier
LogisticLossClassifier

Logistic Loss Classifier
PreProcessing

Preprocess the input to a classification function
LogisticLossClassifier-class

LogisticLossClassifier
NearestMeanClassifier

Nearest Mean Classifier
MajorityClassClassifier

Majority Class Classifier
LogisticRegressionFast

Logistic Regression implementation that uses R's glm
LogisticRegression

(Regularized) Logistic Regression implementation
SSLDataFrameToMatrices

Convert data.frame to matrices for semi-supervised learners
SVM

SVM Classifier
MCLinearDiscriminantClassifier

Moment Constrained Semi-supervised Linear Discriminant Analysis.
MCPLDA

Maximum Contrastive Pessimistic Likelihood Estimation for Linear Discriminant Analysis
LaplacianKernelLeastSquaresClassifier

Laplacian Regularized Least Squares Classifier
MCNearestMeanClassifier

Moment Constrained Semi-supervised Nearest Mean Classifier
S4VM-class

LinearSVM Class
PreProcessingPredict

Preprocess the input for a new set of test objects for classifier
SelfLearning

Self-Learning approach to Semi-supervised Learning
S4VM

Safe Semi-supervised Support Vector Machine (S4VM)
USMLeastSquaresClassifier-class

USMLeastSquaresClassifier
df_to_matrices

Convert data.frame with missing labels to matrices
RSSL-package

RSSL: Implementations of Semi-Supervised Learning Approaches for Classification
clapply

Use mclapply conditional on not being in RStudio
decisionvalues

Decision values returned by a classifier for a set of objects
QuadraticDiscriminantClassifier

Quadratic Discriminant Classifier
WellSVM

WellSVM for Semi-supervised Learning
generateParallelPlanes

Generate Parallel planes
WellSVM_supervised

A degenerated version of WellSVM where the labels are complete, that is, supervised learning
localDescent

Local descent
cov_ml

Biased (maximum likelihood) estimate of the covariance matrix
TSVM

Transductive SVM classifier using the convex concave procedure
add_missinglabels_mar

Throw out labels at random
USMLeastSquaresClassifier

Updated Second Moment Least Squares Classifier
generateCrescentMoon

Generate Crescent Moon dataset
diabetes

diabetes data for unit testing
measure_accuracy

Performance measures used in classifier evaluation
logsumexp

Numerically more stable way to calculate log sum exp
WellSVM_SSL

Convex relaxation of S3VM by label generation
generate2ClassGaussian

Generate data from 2 Gaussian distributed classes
harmonic_function

Direct R Translation of Xiaojin Zhu's Matlab code to determine harmonic solution
generateSlicedCookie

Generate Sliced Cookie dataset
split_dataset_ssl

Create Train, Test and Unlabeled Set
line_coefficients

Loss of a classifier or regression function
print.CrossValidation

Print CrossValidation object
predict,scaleMatrix-method

Predict for matrix scaling inspired by stdize from the PLS package
split_random

Randomly split dataset in multiple parts
generateFourClusters

Generate Four Clusters dataset
threshold

Refine the prediction to satisfy the balance constraint
true_labels

Access the true labels when they are stored as an attribute in a data frame
wlda

Implements weighted likelihood estimation for LDA
wlda_error

Measures the expected error of the LDA model defined by m, p, and iW on the data set a, where weights w are potentially taken into account
plot.LearningCurve

Plot LearningCurve object
solve_svm

SVM solve.QP implementation
posterior

Class Posteriors of a classifier
scaleMatrix

Matrix centering and scaling
generateSpirals

Generate Intersecting Spirals
adjacency_knn

Calculate knn adjacency matrix
summary.CrossValidation

Summary of Crossvalidation results
generateTwoCircles

Generate data from 2 circles
wellsvm_direct

wellsvm implements the wellsvm algorithm as shown in [1].
wdbc

wdbc data for unit testing
svdinv

Inverse of a matrix using the singular value decomposition
plot.CrossValidation

Plot CrossValidation object
rssl-formatting

Show RSSL classifier
svmproblem

Train SVM
missing_labels

Access the true labels for the objects with missing labels when they are stored as an attribute in a data frame
responsibilities

Responsibilities assigned to the unlabeled objects
testdata

Example semi-supervised problem
rssl-predict

Predict using RSSL classifier
sample_k_per_level

Sample k indices per levels from a factor
svdinvsqrtm

Taking the inverse of the square root of the matrix using the singular value decomposition
minimaxlda

Implements weighted likelihood estimation for LDA
losspart

Loss of a classifier or regression function evaluated on partial labels
c.CrossValidation

Merge result of cross-validation runs on single datasets into a the same object
generateABA

Generate data from 2 alternating classes
geom_classifier

Plot RSSL classifier boundary (deprecated)
projection_simplex

Project an n-dim vector y to the simplex Dn
geom_linearclassifier

Plot linear RSSL classifier boundary
stat_classifier

Plot RSSL classifier boundaries
print.LearningCurve

Print LearningCurve object
svmlin

svmlin implementation by Sindhwani & Keerthi (2006)
stderror

Calculate the standard error of the mean from a vector of numbers
svmlin_example

Test data from the svmlin implementation
svdsqrtm

Taking the square root of a matrix using the singular value decomposition
gaussian_kernel

calculated the gaussian kernel matrix
loss

Loss of a classifier or regression function
find_a_violated_label

Find a violated label
wlda_loglik

Measures the expected log-likelihood of the LDA model defined by m, p, and iW on the data set a, where weights w are potentially taken into account
losslogsum

LogsumLoss of a classifier or regression function
EMLeastSquaresClassifier

An Expectation Maximization like approach to Semi-Supervised Least Squares Classification
KernelICLeastSquaresClassifier

Kernelized Implicitly Constrained Least Squares Classification
EMNearestMeanClassifier

Semi-Supervised Nearest Mean Classifier using Expectation Maximization
EMLinearDiscriminantClassifier

Semi-Supervised Linear Discriminant Analysis using Expectation Maximization
BaseClassifier

Classifier used for enabling shared documenting of parameters
ICLinearDiscriminantClassifier

Implicitly Constrained Semi-supervised Linear Discriminant Classifier
EntropyRegularizedLogisticRegression

Entropy Regularized Logistic Regression
ICLeastSquaresClassifier

Implicitly Constrained Least Squares Classifier
GRFClassifier

Label propagation using Gaussian Random Fields and Harmonic functions
CrossValidationSSL

Cross-validation in semi-supervised setting