Michael Benesty

Michael Benesty

7 packages on CRAN

fastrtext

cran
65th

Percentile

Learning text representations and text classifiers may rely on the same simple and efficient approach. 'fastText' is an open-source, free, lightweight library that allows users to perform both tasks. It transforms text into continuous vectors that can later be used on many language related task. It works on standard, generic hardware (no 'GPU' required). It also includes model size reduction feature. 'fastText' original source code is available at <https://github.com/facebookresearch/fastText>.

projector

cran
35th

Percentile

Display dense vector representation of texts on a 2D plan to better understand embeddings by observing the neighbors of a selected text. It also includes an interactive application to change dynamically the pivot text.

unine

cran
28th

Percentile

Implementation of "light" stemmers for French, German, Italian, Spanish, Portuguese, Finnish, Swedish. They are based on the same work as the "light" stemmers found in 'SolR' <https://lucene.apache.org/solr/> or 'ElasticSearch' <https://www.elastic.co/fr/products/elasticsearch>. A "light" stemmer consists in removing inflections only for noun and adjectives. Indexing verbs for these languages is not of primary importance compared to nouns and adjectives. The stemming procedure for French is described in (Savoy, 1999) <doi:10.1002/(SICI)1097-4571(1999)50:10%3C944::AID-ASI9%3E3.3.CO;2-H>.

caret

cran
99th

Percentile

Misc functions for training and plotting classification and regression models.

xgboost

cran
98th

Percentile

Extreme Gradient Boosting, which is an efficient implementation of the gradient boosting framework from Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>. This package is its R interface. The package includes efficient linear model solver and tree learning algorithms. The package can automatically do parallel computation on a single machine which could be more than 10 times faster than existing gradient boosting packages. It supports various objective functions, including regression, classification and ranking. The package is made to be extensible, so that users are also allowed to define their own objectives easily.

lime

cran
94th

Percentile

When building complex models, it is often difficult to explain why the model should be trusted. While global measures such as accuracy are useful, they cannot be used for explaining why a model made a specific prediction. 'lime' (a port of the 'lime' 'Python' package) is a method for explaining the outcome of black box models by fitting a local model around the point in question an perturbations of this point. The approach is described in more detail in the article by Ribeiro et al. (2016) <arXiv:1602.04938>.

58th

Percentile

Feature hashing, also called as the hashing trick, is a method to transform features of a instance to a vector. Thus, it is a method to transform a real dataset to a matrix. Without looking up the indices in an associative array, it applies a hash function to the features and uses their hash values as indices directly. The method of feature hashing in this package was proposed in Weinberger et al. (2009) <arXiv:0902.2206>. The hashing algorithm is the murmurhash3 from the 'digest' package. Please see the README in <https://github.com/wush978/FeatureHashing> for more information.