Learn R Programming

shattering (version 1.0.7)

Estimate the Shattering Coefficient for a Particular Dataset

Description

The Statistical Learning Theory (SLT) provides the theoretical background to ensure that a supervised algorithm generalizes the mapping f:X -> Y given f is selected from its search space bias F. This formal result depends on the Shattering coefficient function N(F,2n) to upper bound the empirical risk minimization principle, from which one can estimate the necessary training sample size to ensure the probabilistic learning convergence and, most importantly, the characterization of the capacity of F, including its under and overfitting abilities while addressing specific target problems. In this context, we propose a new approach to estimate the maximal number of hyperplanes required to shatter a given sample, i.e., to separate every pair of points from one another, based on the recent contributions by Har-Peled and Jones in the dataset partitioning scenario, and use such foundation to analytically compute the Shattering coefficient function for both binary and multi-class problems. As main contributions, one can use our approach to study the complexity of the search space bias F, estimate training sample sizes, and parametrize the number of hyperplanes a learning algorithm needs to address some supervised task, what is specially appealing to deep neural networks. Reference: de Mello, R.F. (2019) "On the Shattering Coefficient of Supervised Learning Algorithms" ; de Mello, R.F., Ponti, M.A. (2018, ISBN: 978-3319949888) "Machine Learning: A Practical Approach on the Statistical Learning Theory".

Copy Link

Version

Install

install.packages('shattering')

Monthly Downloads

214

Version

1.0.7

License

GPL-3

Maintainer

Rodrigo F. de Mello

Last Published

August 21st, 2021

Functions in shattering (1.0.7)

apply_classifier

Apply a classifier induced with function build_classifier
complexity_analysis

Produce a PDF report analyzing the lower and upper shattering coefficient functions
number_regions

Computes the maximal number of space regions
shattering

shattering: A package to estimate the shattering coefficient for labeled data samples.
equivalence_relation

Function to compute equivalence relations among input space points.
compress_space

Function to compress the space based on the equivalence relations.
estimate_number_hyperplanes

Function to estimate the number of hyperplanes required to classify such a data sample.
build_classifier

Produce a set of SVM classifiers