Charlie Wusuo Liu
3 packages on CRAN
Specialized solvers for combinatorial optimization problems in the Subset Sum family. These solvers differ from the mainstream in the options of (i) restricting subset size, (ii) bounding subset elements, (iii) mining real-value sets with predefined subset sum errors, and (iv) finding one or more subsets in limited time. A novel algorithm for mining the one-dimensional Subset Sum induced algorithms for the multi-Subset Sum and the multidimensional Subset Sum. The latter decomposes the problem in a novel approach, and the multi-threaded framework offers exact algorithms to the multidimensional Knapsack and the Generalized Assignment problems. Package updates include (a) renewed implementation of the multi-Subset Sum, multidimensional Knapsack and Generalized Assignment solvers; (b) availability of bounding solution space in the multidimensional Subset Sum; (c) fundamental data structure and architectural changes for enhanced cache locality and better chance of SIMD vectorization; (d) an option of mapping real-domain problems to the integer domain with user-controlled precision loss, and those integers are further zipped non-uniformly in 64-bit buffers. Arithmetic on compressed integers is done by bit-manipulation and the design has virtually zero speed lag relative to normal integers arithmetic. The consequent reduction in dimensionality may yield substantial acceleration. Compilation with g++ '-Ofast' is recommended. See package vignette (<arXiv:1612.04484v3>) for details. Functions prefixed with 'aux' (auxiliary) are or will be implementations of existing foundational or cutting-edge algorithms for solving optimization problems of interest.
High performance trainers for parameterizing and clustering weighted data. The Gaussian mixture (GM) module includes the conventional EM (expectation maximization) trainer, the component-wise EM trainer, the minimum-message-length EM trainer by Figueiredo and Jain (2002) <doi:10.1109/34.990138>. These trainers accept additional constraints on mixture weights and covariance eigen ratios. The K-means (KM) module offers clustering with the options of (i) deterministic and stochastic K-means++ initializations, (ii) upper bounds on cluster weights (sizes), (iii) Minkowski distances, (iv) cosine dissimilarity, (v) dense and sparse representation of data input. The package improved the usual implementations of GM and KM training algorithms in various aspects. It is carefully crafted in multithreaded C++ for processing large data in industry use.
Simulate multivariate correlated data given nonparametric marginals and their covariance structure characterized by a Pearson or Spearman correlation matrix. The simulator engages the problem from a purely computational perspective. It assumes no statistical models such as copulas or parametric distributions, and can approximate the target correlations regardless of theoretical feasibility. The algorithm integrates and advances the Iman-Conover (1982) approach <doi:10.1080/03610918208812265> and the Ruscio-Kaczetow iteration (2008) <doi:10.1080/00273170802285693>. Package functions are carefully implemented in C++ for squeezing computing speed, suitable for large input in a manycore environment. Precision of the approximation and computing speed both outperform various CRAN packages to date by substantial margins. Benchmarks are detailed in function examples. A simple heuristic algorithm is additionally designed to optimize the joint distribution in the post-simulation stage. This heuristic demonstrated not only strong capability of cost reduction, but also good potential of achieving the same level of precision of approximation without the enhanced Iman-Conover-Ruscio-Kaczetow.