DoE.MIParray-package: Package to Create a MIP Based Array

Description

'CRAN' packages 'DoE.base' and 'Rmosek' and non-'CRAN' package 'gurobi' are enhanced with functionality for the creation of optimized arrays for experimentation, where optimization is in terms of generalized minimum aberration. It is also possible to optimally extend existing arrays to larger run size. The package writes 'MPS' (Mathematical Programming System) files for use with any mixed integer optimization software that can process such files. If at least one of the commercial products 'Gurobi' or 'Mosek' (free academic licenses available for both) is available, the package also creates arrays by optimization. For installing 'Gurobi' and its R package 'gurobi', follow instructions at <https://www.gurobi.com/products/gurobi-optimizer/> and <https://www.gurobi.com/documentation/7.5/refman/r_api_overview.html> (or higher version). For installing 'Mosek' and its R package 'Rmosek', follow instructions at <https://www.mosek.com/downloads/> and <https://docs.mosek.com/8.1/rmosek/install-interface.html>, or use the functionality in the stump CRAN R package 'Rmosek'.

Arguments

Warning

Escaping from a Gurobi run will most likely be unsuccessful, might leave R in an unstable state, and usually fails to release the entire CPU usage; thus, one should think carefully about the affordable run times.
Mosek can usually be escaped using the <ESC> key, and one can even hope to get a valid output. However, after such escapes, it is also advisable to use RMosek's internal clean function (Rmosek:::mosek_clean()), since computer instabilities after repeated escapes from Mosek have been observed (the package functions execute the mosek_clean command after conducting Mosek runs).

Installation

Gurobi and Mosek need to be separately installed; please follow vendors' instructions; it is necessary to obtain a license; for academic use, free academic licenses are available in both cases.

Both Gurobi and Mosek provide R packages (gurobi and Rmosek) for accessing the software, and they also provide instructions on their installation.

For Gurobi, the R package gurobi is provided with the software installation files and can simply be installed like usual for a non-Web package.

For Mosek, there is a CRAN package Rmosek, which is a stump only and can be used for installing the suitable package Rmosek from the Mosek website. Its installation requires compilation from source. Several prerequisites are needed, basically the same ones needed for compiling packages. Make sure to set up the R environment for this purpose (for Windows, see https://cran.r-project.org/bin/windows/Rtools/; you may have to set some paths yourself; I am not familiar with the proper process for other platforms). If this is accomplished and package Matrix is at the latest level (as recommended in the Rmosek online documentation), install the CRAN package Rmosek, whose purpose it is to support installation of the appropriate Rmosek package for your platform and Mosek version. Once this CRAN package is available, run function mosek_attachbuilder, specifying the appropriate path as pointed out in the function's documentation, and subsequently run the custom-made function install.rmosek. After this activity (if all went well), the CRAN package Rmosek will have been replaced by a working version of package Rmosek whose version number depends on your version of Mosek. In case of problems, running the install.packages command provided in the Rmosek online documentation for your version of Mosek (Mosek ApS 2017b) may also be worth a try; if all else fails, you may have to contact Rmosek support.

Don't be confused by Mosek ApS's somewhat strange communication regarding the package Rmosek: the CRAN version (stump for the purpose of supporting installation of the working version) has its own separate numbering that currently starts with “1”. The working Rmosek packages have version numbers whose first digit is kept in sync with the Mosek major version number; apart from that, version numbers of Rmosek versions and Mosek versions are not kept in sync; for example, the current version (July 13 2019) of package Rmosek for Mosek version 8 is 8.0.69, while the Mosek version is 8.1.0.81 (the revision versions of Mosek change quite frequently). Whenever Mosek itself is re-installed (at least with a change of minor version like 8.0 to 8.1), the Rmosek sources must be re-compiled, even if their version has not changed; this is presumably why Mosek ApS speak of a version number for the binary that corresponds to the Mosek version number, while the sources keep their version number (somewhat confusing for the R community). From within R, version numbers can be queried by packageVersion("Rmosek") and Rmosek::mosek_version(), respectively.

Author

Ulrike Groemping

Details

The package creates an array with specified minimum resolution and optimized word length pattern, using mixed integer programming (MIP). This is a step in generalized minimum aberration (GMA) according to Xu and Wu (2001), which is a way to minimize the confounding potential in a factorial design, as measured by the generalized word length pattern (GWLP). Reduction of short words has priority over reduction of long words, because it pertains to confounding of lower order interactions, which is assumed to be more severe than confounding of higher order interactions.

There is always one word of length zero. An array is said to have strength t (resolution R=t+1), if it has no words of lengths 1,...,t, but has words of length R=t+1; in that case, t-factor interactions are not confounded with the overall mean, (t-1)-factor interactions are not confounded with main effects, and so forth. Groemping and Xu (2014) provided an interpretation of the number of shortest words (i.e. words of length R) in terms of coefficients of determination of linear models with main effects model matrix columns on the LHS and full factorial t-factor models on the right-hand side; for example, in an array of strength 2 with three 3-level factors, the number of words of length 3 is the sum of the two R^2 values obtained from regressing the two main effects model matrix columns of one of the factors on a full model in the other two factors, provided main effect model matrix columns are coded orthogonally (to each other and the overall mean). GMA considers a design better than another one, if it has larger strength or resolution. In case of ties, a design is better if it has fewer shortest words; further ties are resolved by comparing words of lengths t+2, t+3, ...

Any array found by a function of this package will have the requested resolution (if not possible, an error will be thrown). If desired (default, but can be suppressed), optimization of the number of shortest words will be attempted. If kmax is chosen larger than the resolution, optimization of longer words will also be attempted. Mixed integer optimization is very resource-intensive and often fails to provide a confirmed optimum (see also below). Choosing kmax larger than the resolution is therefore advisable for very small problems only; in most cases, one should attempt an optimization of the number of shortest words only, or even suppress that by find.only=TRUE. Only after this has been achieved, possibly with several sequential attempts, a subsequent attempt to improve the number of longer words should be undertaken.

Functions gurobi_MIParray and mosek_MIParray create an array from scratch (start=NULL and forced=NULL), by trying to improve a starting array specified with the argument start, or by restricting a portion of the array to correspond to a pre-existing array with the argument forced. If no starting array is given, the functions initially obtain one by solving a linear optimization problem for obtaining a design with the requested resolution. Subsequently, the number of shortest words of the starting array is optimized, followed by the numbers of words of length up to kmax.

Where functions gurobi_MIParray and mosek_MIParray do not easily find an optimal array, it may be worthwhile to consider using functions gurobi_MIPsearch and mosek_MIPsearch, which can search over different orderings of the factor levels; these have been provided because a brief search for a fortunate level orderings may be successful where a very long search for an unfortunate level ordering fails.

The algorithm implemented in the package is explained in Groemping and Fontana (2019); it is a modification of Fontana (2017). Modifications include enforcing the requested resolution via a linear optimization step (much faster than sequentially optimizing all those word lengths until they are zeroes), and using a result on coding invariance from Groemping (2018) for a more parsimonious formulation of constraints. Mixed integer programming can use a lot of time and resources; in particular, the confirmation of the optimality of a solution that has been found can take prohibitively long, even if the optimal solution itself has been found fast (which is also not necessarily so). Functions gurobi_MIPcontinue and mosek_MIPcontinue can be used to continue optimization for larger problems, where optimization was previously aborted, e.g. due to a time limit (however, a lot of effort is lost and has to be repeated when continuing a previous attempt). Note that it is possible to continue an optimization effort started with Gurobi using Mosek and vice versa, because the functions gurobi_MIPcontinue and mosek_MIPcontinue can convert problems using the internal functions mosek2gurobi and gurobi2mosek. Groemping (2020) explains the package itself.

The solver functions in the package use the commercial solvers Mosek and/or Gurobi, which provide free academic licenses. These solvers and their corresponding vendor-provided R packages have to be installed for using the package's solver functions. For users who do not have access to these but to other solvers, the package provides export functions to mps files that can be read by other solvers, for example by IBM CPLEX (write_MPSMIQP, write_MPSILPlist).

References

Fontana, R. (2017). Generalized Minimum Aberration mixed-level orthogonal arrays: a general approach based on sequential integer quadratically constrained quadratic programming. Communications in Statistics – Theory Methods 46, 4275-4284.

Groemping, U. and Xu, H. (2014). Generalized resolution for orthogonal arrays. The Annals of Statistics 42, 918-939.

Groemping, U. (2018). Coding Invariance in Factorial Linear Models and a New Tool for Assessing Combinatorial Equivalence of Factorial Designs. Journal of Statistical Planning and Inference 193, 1-14.

Groemping, U. and Fontana R. (2019). An Algorithm for Generating Good Mixed Level Factorial Designs. Computational Statistics & Data Analysis 137, 101-114.

Groemping, U. (2020). DoE.MIParray: an R package for algorithmic creation of orthogonal arrays. Journal of Open Research Software, 8: 24. DOI: https://doi.org/10.5334/jors.286

Gurobi Optimization Inc. (2018). Gurobi Optimizer Reference Manual. https://www.gurobi.com:443/documentation/.

Mosek ApS (2017a). MOSEK version w.x.y.z documentation. Accessible at: https://www.mosek.com/documentation/. This package has been developed using version 8.1.0.23 (accessed August 29 2017).

Mosek ApS (2017b). MOSEK Rmosek Package 8.1.y.z. https://docs.mosek.com/8.1/rmosek/index.html. !!! In normal R speak, this is the documentation of the Rmosek package version 8.0.69, when applied on top of the Mosek version 8.1.y.z (this package has been devoloped with Mosek version 8.1.0.23 and will likely not work for Mosek versions before 8.1). !!! (accessed August 29 2017)

Xu, H. and Wu, C.F.J. (2001). Generalized minimum aberration for asymmetrical fractional factorial designs. Annals of Statistics 29, 549-560.

Examples

Run this code

if (FALSE) {
## ideal sequence of optimization problems
## shown here for Mosek,
## for Gurobi analogous, if necessary increasing maxtime to e.g. 600 or 3600 or ...

## very small problem
plan <- mosek_MIParray(16, rep(2,6), resolution=4, kmax=6)

## an example approach for a larger problem
## optimize shortest word length
plan3 <- mosek_MIParray(24, c(2,4,3,2,2,2,2), resolution=3, maxtime=20)
## feasible solution was found, no confirmed optimum, 7/3 words of length 3
## try to optimize further or confirm optimality (improve=TRUE does this),
##                      give it 10 minutes
plan3b <- mosek_MIPcontinue(plan3, improve=TRUE, maxtime=600)
##     no improvement has been found, and the gap is still very large
##     (the time limit makes the result non-deterministic, of course,
##      because it depends on the computer's power and availability of its resources)

## For large problems, it cannot be expected that a *confirmed* optimum is found.
## Of course, one can put more effort into the optimization, e.g. by running overnight.
## It is also advisable to compare the outcome to other ways for obtaining a good array,
##    e.g. function oa.design from package DoE.base with optimized column allocation.
require(DoE.base)
show.oas(nruns=24, nlevels=c(2,4,3,2,2,2,2), show=Inf)
GWLP(plan_oad <- oa.design(nruns=24, nlevels=c(2,4,3,2,2,2,2), col="min34"))
## here, plan3b has a better A3 than plan_oad

## one might also try to confirm optimality by switching to the other optimizer
plan3c <- gurobi_MIPcontinue(plan3b, improve=TRUE, maxtime=600, MIPFocus=3)
   ## focus on improved bound with option MIPFocus
   ## still same value with very large gap after running this
   ## thus, now assume this as best practically feasible value

## one might now try to improve words of length 4 (improve=FALSE turns to the next word length)
plan4 <- mosek_MIPcontinue(plan3b, improve=FALSE, maxtime=600)
   ## this does not yield any improvement
   ## working on longer words is not considered worthwhile
   ## thus, plan3 or plan3b are used for pragmatic reasons,
   ## without confirmed optimality
}

Run the code above in your browser using DataLab