Learn R Programming

smartdata

Package that integrates preprocessing algorithms for oversampling, instance/feature selection, normalization, discretization, space transformation, and outliers/missing values/noise cleaning.

Installation

You can install the latest smartdata stable release from CRAN with:

# This sets both CRAN and Bioconductor as repositories to resolve dependencies
setRepositories(ind = 1:2)
install.packages("smartdata")

and load it into an R session with:

library("smartdata")

Examples

smartdata provides the following wrappers:

  • instance_selection
  • feature_selection
  • normalize
  • discretize
  • space_transformation
  • clean_outliers
  • impute_missing
  • clean_noise

To get the possible methods available for a certain wrapper, we can do:

which_options("instance_selection")
#> Possible methods are: 'CNN', 'ENN', 'multiedit', 'FRIS'

To get information about the parameters available for a method:

which_options("instance_selection", "multiedit")
#> For more information do: ?class::multiedit 
#> Parameters for multiedit are: 
#>   * k: Number of neighbors used in KNN 
#>        Default value: 1 
#>   * num_folds: Number of partitions the train set is split in 
#>                Default value: 3 
#>   * null_passes: Number of null passes to use in the algorithm 
#>                  Default value: 5

First let’s load a bunch of datasets:

data(iris0,  package = "imbalance")
data(ecoli1, package = "imbalance")
data(nhanes, package = "mice")

Oversampling

super_iris <- iris0 %>% oversample(method = "MWMOTE", ratio = 0.8, filtering = TRUE)

Instance selection

super_iris <- iris %>% instance_selection("multiedit", k = 3, num_folds = 2, 
                                          null_passes = 10, class_attr = "Species")

Feature selection

super_ecoli <- ecoli1 %>% feature_selection("Boruta", class_attr = "Class")

Normalization

super_iris <- iris %>% normalize("min_max", exclude = c("Sepal.Length", "Species"))

Discretization

super_iris <- iris %>% discretize("ameva", class_attr = "Species")

Space transformation

super_ecoli <- ecoli1 %>% space_transformation("lle_knn", k = 3, num_features = 2)

Outliers

super_iris <- iris %>% clean_outliers("multivariate", type = "adj")

Missing values

super_nhanes <- nhanes %>% impute_missing("gibbs_sampling")

Noise

super_iris <- iris %>% clean_noise("hybrid", class_attr = "Species", 
                                   consensus = FALSE, action = "repair")

Copy Link

Version

Install

install.packages('smartdata')

Monthly Downloads

2

Version

1.0.3

License

GPL (>= 2) | file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Ignacio Cord<c3><b3>n

Last Published

December 18th, 2019

Functions in smartdata (1.0.3)

instance_selection

Instance selection wrapper
smartdata

smartdata: A package to ease data preprocessing tasks
clean_noise

Noise cleaning wrapper
clean_outliers

Outliers cleaning wrapper
space_transformation

Space transformation wrapper
discretize

Discretization wrapper
feature_selection

Feature selection wrapper
which_options

Prints options for a wrapper or a certain preprocessing method
normalize

Normalization wrapper
%>%

Pipe operator
oversample

Oversampling wrapper
impute_missing

Missing values imputation wrapper