Learn R Programming

doMIsaul

Overview

The goal of package is to provide functions to perform unsupervised and semisupervised learning for an incomplete dataset.

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("LilithF/doMIsaul")

Example

This is a basic example which shows you how to perform unsupervised learning for an incomplete dataset:

library(doMIsaul)
data(cancer, package = "survival")
cancer$status <- cancer$status - 1
cancer <- cancer[, -1]

set.seed(1243)
res.unsup <- 
  unsupMI(data = list(cancer), Impute = "MImpute_surv", Impute.m = 10,
          cleanup.partition = TRUE, return.detail = TRUE)

cancer$part_unsup <- res.unsup$Consensus

plot_MIpca(res.unsup$Imputed.data, 1:228, color.var = cancer$part_unsup,
           pca.varsel = c("age", "sex", "ph.ecog", "ph.karno", "pat.karno",
                          "meal.cal",  "wt.loss"))
plot_boxplot(data = cancer, partition.name = "part_unsup",
             vars.cont = c("age", "meal.cal", "wt.loss"),
             unclass.name = "Unclassified", include.unclass = FALSE)
#> Warning: Removed 27 rows containing non-finite values (stat_boxplot).
plot_frequency(data = cancer, partition.name = "part_unsup",
               vars.cat = c("sex", "ph.ecog"))

This is a basic example which shows you how to perform semisupervised learning for an incomplete dataset with a survival outcome:

## With imputation included
set.seed(345)
res.semisup <- 
  seMIsupcox(X = list(cancer[, setdiff(colnames(cancer), "part_unsup")]),
             Y = cancer[, c("time", "status")],
             Impute = TRUE, Impute.m = 10, center.init = TRUE,
             nfolds = 10, center.init.N = 50, 
             cleanup.partition = TRUE, return.detail = TRUE)
# This is an example, a larger value for center.init.N is recommended.

cancer$part_semisup <- res.semisup$Consensus[[1]]

plot_MIpca(res.semisup$Imputed.data, NULL, color.var = cancer$part_semisup,
           pca.varsel = c("age", "sex", "ph.ecog", "ph.karno", "pat.karno",
                          "meal.cal",  "wt.loss"))
plot_boxplot(data = cancer, partition.name = "part_semisup",
             vars.cont = c("age", "meal.cal", "wt.loss"),
             unclass.name = "Unclassified", include.unclass = TRUE)
#> Warning: Removed 61 rows containing non-finite values (stat_boxplot).
plot_frequency(data = cancer, partition.name = "part_semisup",
               vars.cat = c("sex", "ph.ecog"))

Reference publications

You may find more details on the methods implemented in this package in the associated publications:

  • Unsupervised MI learning: Faucheux L, Resche-Rigon M, Curis E, Soumelis V, Chevret S., Clustering with missing and left-censored data: A simulation study comparing multiple-imputation-based procedures. Biometrical Journal. 2021; 63: 372– 393. https://doi.org/10.1002/bimj.201900366
  • Semisupervised MI learning for a survival outcome: Faucheux L, Soumelis V, Chevret S., Multiobjective semisupervised learning with a right-censored endpoint adapted to the multiple imputation framework. Biometrical Journal. 2021; 1– 21. https://doi.org/10.1002/bimj.202000365

Copy Link

Version

Install

install.packages('doMIsaul')

Monthly Downloads

15

Version

1.0.1

License

GPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

Lilith Faucheux

Last Published

October 18th, 2021

Functions in doMIsaul (1.0.1)

CH

CH index
CVE_LP

Cross-validation for cox regression using the linear predictor estimator with wrapper for warnings handling
MImpute

Wrapper functions for multivariate imputation with survival data or left-censored data
Allocation_Distance

Allocation Distance
CritCF

CritCF index
CritCF.sel

Number of cluster selection according to CritCF index
MIclust_mpool

MultiCons wrapper for imputed datasets
CH.sel

Number of cluster selection according to CH index
Extract_AUC

Wrapper to evaluate time dependent AUC
evaluate_partition_unsup

Comparison of an unsupervised obtained partition to a reference partition.
objective_clustering

objective clustering cost
MultiCons

MultiCons Consensus Clustering Algorithm
pareto

Pareto optimization
partition_generation

Unsupervised partition with K selection
cleanUp_partition

Remove small clusters (i.e. unclassified observations for which no consensus was obtained)
plot_MIpca

Plot a PCA from a multiply imputed dataset.
evaluate_partition_semisup

Evaluation of a semisupervised obtained partition in comparison to reference partitions
cve_LinearPred

Cross-validation for cox regression using the linear predictor estimator
.cens.draw3

Base function for imputing left censored data with MICE
mice.impute.cens

Impute left censored data with MICE
anovatab

Table of ANOVA test for several explanatory variables
exctract_center_position

Extract the cluster centers coordinates
chi2tab

Table of chisq.test() test for several explanatory variables
seMIsupcox

Semisupervised learning for a right censored endpoint
formatpv

Apply format to p-values
initiate_centers

Initiate centers for clustering algorithm
plot_boxplot

ggplot type boxplots for each vars.cont by partition level.
table_categorical

Display table with comparison of the partition with categorical variables.
plot_frequency

ggplot type barplots representing frequencies for each vars.cat by partition level.
unsupMI

Unsupervised learning for incomplete dataset
table_continuous

Display table with comparison of the partition with continuous variables.
my_jack

Partition sorting based on Jaccard index