randomUniformForest-package: Random Uniform Forests for Classification, Regression and Unsupervised Learning
Description
Ensemble model for classification, regression and unsupervised learning, based on a forest of unpruned and randomized binary decision trees. Unlike Breiman's Random Forests, each tree is grown by sampling, with replacement, a set of variables at each node. Each cut-point is generated randomly, according to the continuous Uniform distribution between two random points of each candidate variable or using its whole current support. Optimal random node is, then, selected among many full random nodes by maximizing information gain (classification) or minimizing 'L2' (or 'L1') distance (regression). Unlike Extremely Randomized Trees, data are either bootstrapped or sub-sampled for each tree. Random Uniform Forests are aimed to lower correlation between trees, to offer a deep analysis of variable importance and to allow native distributed and incremental learning. The unsupervised mode introduces clustering and dimension reduction, using a three-layer engine (dissimilarity matrix, Multidimensional Scaling and k-means or hierarchical clustering).Details
ll{
Package: randomUniformForest
Type: Package
Version: 1.1.2
Date: 2015-01-05
License: BSD_3_clause
}
Installation: install.packages("randomUniformForest")
Usage: library(randomUniformForest)References
Amit, Y., Geman, D., 1997. Shape Quantization and Recognition with Randomized Trees. Neural Computation 9, 1545-1588.
Biau, G., Devroye, L., Lugosi, G., 2008. Consistency of random forests and other averaging classifiers. The Journal of Machine Learning Research 9, 2015-2033.
Bousquet, O., Boucheron, S., Lugosi, G., 2004. Introduction to Statistical Learning Theory, in: Bousquet, O., Luxburg, U. von, Ratsch, G. (Eds.), Advanced Lectures on Machine Learning, Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp. 169-207.
Breiman, L, 1996. Heuristics of instability and stabilization in model selection. The Annals of Statistics 24, no. 6, 2350-2383.
Breiman, L., 1996. Bagging predictors. Machine learning 24, 123-140.
Breiman, L., 2001. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statistical Science 16, no. 3, 199-231.
Breiman, L., 2001. Random Forests, Machine Learning 45(1), 5-32.
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C., 1984. Classification and Regression Trees. New York: Chapman and Hall.
Ciss, S., 2014. PhD thesis: Forets uniformement aleatoires et detection des irregularites aux cotisations sociales. Universite Paris Ouest Nanterre, France. In french.
English title : Random Uniform Forests and Irregularity Detection in social Security contributions.
Link : https://www.dropbox.com/s/q7hbgeafrdd8qtc/Saip_Ciss_These.pdf?dl=0
Ciss, S., 2014a. Random Uniform Forests. Pre-print.
Ciss, S., 2014b. Variable Importance in Random Uniform Forests. Pre-print.
Cox, T. F., Cox, M. A. A., 2001. Multidimensional Scaling. Second edition. Chapman and Hall.
Devroye, L., Gyorfi, L., Lugosi, G., 1996. A probabilistic theory of pattern recognition. New York: Springer.
Dietterich, T.G., 2000. Ensemble Methods in Machine Learning, in : Multiple Classifier Systems, Lecture Notes in Computer Science.
Springer Berlin Heidelberg, pp. 1-15.
Efron, B., 1979. Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics 7, 1-26.
Gower, J. C., 1966. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53, 325-328.
Hastie, T., Tibshirani, R., Friedman, J.J.H., 2001. The elements of statistical learning. New York: Springer.
Ho, T.K., 1998. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 832-844.
Lin, Y., Jeon, Y., 2002. Random Forests and Adaptive Nearest Neighbors. Journal of the American Statistical Association 101-474.
Vapnik, V.N., 1995. The nature of statistical learning theory. Springer-Verlag New York, Inc., New York, NY, USA.