Function for comfortably creating a D-optimal design with or without blocking based on functions optFederov or optBlock from package AlgDesign; this functionality is still somewhat experimental.
Dopt.design(nruns, data=NULL, formula=~., factor.names=NULL, nlevels=NULL,
digits=NULL, constraint=NULL, center=FALSE, nRepeats=5, seed=NULL, randomize=TRUE,
blocks=1, block.name="Blocks", wholeBlockData=NULL, qual=NULL, ...)The function returns a data frame of S3 class design
with attributes attached.
The data frame contains the experimental settings.
The matrix desnum attached as attribute desnum contains the
model matrix of the design, using the formula as specified in the call.
Function Dopt.augment preserves additional variables (e.g. responses) that
have been added to the design design before augmenting. Note, however, that
the response data are NOT used in deciding about which points to augment the design with.
The attribute run.order provides the run number in standard order (as returned from
function optFederov in package AlgDesign) as well
as the randomized actual run order. The third column is always identical to the first.
The attribute design.info is a list of various design properties, with type resolving to “Dopt”,
“Dopt.blocked”, “Dopt.splitplot”.
In addition to the standard list elements (cf. design), the element
quantitative is a vector of nfactor logical values or NAs,
and the optional digits elements indicates the number of digits to
which the data were rounded.
For blocked and splitplot designs, the list contains additional information on numbers and sizes of blocks or plots,
as well as the number of whole plot factors (which are always the first few factors) and split-plot factors.
The list contains a list of optimality criteria as calculated by function optFederov,
see documentation there)
with elements D, Dea, A and G.
(Note that replications is always 1 and repeat.only is always FALSE;
these elements are only present to fulfill the formal requirements for class design.
Note however, that blocked designs do in fact repeat experimental runs if nruns and blocks
imply this.)
number of runs in the requested design
data frame or matrix of candidate design points;
if data is specified, factor.names and levels are ignored
a model formula (starting with a tilde),
for the estimation of which a D-optimal design is sought;
it can contain all column names from data
or elements or element names from factor.names, respectively;
usage of the “.”-notation for “all variables” from data or factor.names
is possible.
The default formula linearly includes all main effects for columns of data or factors from
factor.names respectively, by using the “.”-notation.
Note that the variables from wholeBlockData must be explicitly included into the formula
and are not covered by the “.”-notation for “all variables”. (Thus, the default formula
does not work, if wholeBlockData is used.)
For quantitative factors, functions quad() and cubic describe the
full quadratic or full cubic model in the listed variables (cf. examples
and the expand.formula-function from package AlgDesign).
is used for creating a candidate set (for the within Block factors)
with the help of function
fac.design, if data is not specified. It is a
list of vectors which contain
- individual levels
- or (in case of numerical values combined with nlevels) lower and upper scale end values
for each factor.
The element names are used as variable names;
if the list is not named, the variable names are A, B and so forth (from function
fac.design).
factor.names can also be a character vector.
In this case, nlevels must be specified, and levels are automatically assigned
as integers starting with 1, which implies quantitative factors,
unless qual=TRUE is specified.
can be omitted if the list factor.names explicitly
lists all factor levels (which of course defines the number of levels).
For numeric factors for which factor.names only specifies the
two scale ends, these are filled with equally-spaced intermediate points,
using the nlevels entry as the length.out argument to function
seq.
If factor.names is a character vector of factor names only,
nlevels is required, and default levels are created.
is used for creating a candidate set if data is not specified.
It specifies the digits to which numeric design columns are rounded in case of
automatic creation of intermediate values. It can consist of one single value
(the same for all such factors) or a numeric vector of the same length
as factor.names with integer entries.
a condition (character string!) used for reducing the candidate
set to admissible points only.
constraint is evaluated on the specified data set or after automatic creation
of a full factorial candidate data set.
The variable names from data or factor.names can be used by the constraint.
The variable names from wholePlotData can NOT be used.
See Syntax and Logic
for an explanation of the syntax of general and especially logical
R expressions.
requests that optimization is run for the centered model; the design is nevertheless output in non-centered coordinates
number of independent repeats of the design optimization process; increasing this number may improve the chance of finding a global optimum, but will also increase search time
seed for generation and randomization of the design (integer number);
here, the seed is needed even if the design is not randomized, because the
generation process for the optimum design involves random numbers, even if the
order of the final design is not randomized;
if a reproducible design is needed, it is therefore recommended to specify a seed.
In R version 3.6.0 and later, the default behavior of function sample
has changed. If you work in a new (i.e., >= 3.6.-0) R version and want to reproduce
a randomized design from an earlier R version (before 3.6.0),
you have to change the RNGkind setting by
RNGkind(sample.kind="Rounding")
before running function Dopt.design.
It is recommended to change the setting back to the new recommended way afterwards:
RNGkind(sample.kind="default")
For an example, see the documentation of the example data set VSGFS.
logical deciding whether or not the design should be randomized;
if it is TRUE, the design (or the additional portion of the design) returned by the
workhorse function optFederov is brought
into random order after generation. Note that the generation process
itself contains a random element per default; if exact repeatability for the
returned design is desired, it is necessary to specify a seed (option seed)
if in the case randomize=FALSE.
a single integer giving the number of blocks (default 1, if no blocking is needed)
OR
a vector of block sizes which enable blocks of different sizes;
for a scalar value, nruns must be divisible into blocks equally-sized blocks;
for a vector value, the block sizes must add up to nruns.
If blocking is requested, the following two options are potentially important.
character string: name of the blocking variable (used only if blocks are requested)
optional matrix or data frame that specifies the whole block characteristics;
can only be used if blocks are requested; if used, it must have as many rows as there are block sizes.
If this is specified, the resulting design is a split-plot design with the whole-plot
factors specified in wholeBlockData, the split-plot factors specified in data.
Note that usage of this option makes it necessary to explicitly specify a formula.
Since wholeBlockData must be completely specified by the user, optimization is for the split-plot portion of the design only. The rationale is (assumably) that the characteristics of the available blocks are known. If this is not the case, users may want to try out various possible whole block setups, or to proceed sequentially by first optimizing a whole block design for a model with the whole block factors only and subsequently using this model for adding split-plot factors.
optional logical (length 1 or same as number of factors); ignored, if data
is specified; overrides automatic determination of whether or not factors are quantitative;
if neither qual nor data are specified, factors are per default quantitative,
unless they have non-numeric levels in a list-valued factor.names
additional arguments to functions optFederov
or optBlock (if blocking is requested)
from package AlgDesign;
interesting arguments for optFederov: maxIteration,
nullify (calculate good starting design, especially set to 1,
in which case nRepeats is set to 1);
arguments criterion and augment are not available, neither
are evaluateI, space, or rows, and args
does not have an effect.
Since R version 3.6.0, the behavior of function sample has changed
(correction of a biased previous behavior that should not be relevant for the randomization of designs).
For reproducing a design that was produced with an earlier R version,
please follow the steps described with the argument seed.
Ulrike Groemping
Function Dopt.design creates a D-optimal design, optionally with blocking,
and even as a split-plot design. If no blocks are required, calculations are carried
out through function optFederov from package AlgDesign.
In case of blocked designs, function optBlock from package AlgDesign
is behind the calculations. By specifying wholeBlockData, a blocked design becomes
a split-plot design. The model formula can refer to both the within block data (only those
are referred to by the “.” notation) and the whole block data and interactions between both.
In comparison to direct usage of package AlgDesign, the function adds the possibility
of automatically creating the candidate points on the fly, with or without constraints.
Furthermore, it embeds the D-optimal designs into the class design.
On the other hand, it sacrifices some of AlgDesigns flexibility; of course, users
can still use AlgDesign directly.
The D-optimal designs are particularly useful, if the classical regular designs are too demanding in run size requirements, or if constraints preclude automatic generation of orthogonal designs. Note, however, that the best design in few runs can still be very bad in absolute terms!
When specifying the design without the data option, a full factorial in the
requested factors is the default candidate set of design points. For some situations - especially
with many factors - it may be better to start from a restricted candidate set. Such a candidate set
can be produced with another R function, e.g. oa.design or FrF2,
or can be manually created.
If there are doubts, whether the process has delivered a design close to the absolute optimum,
nRepeats can be increased.
For unblocked designs, it is additionally possible to increase maxIteration.
Also, improving the starting
value by nullify=1 or nullify=2 may lead to an improved design.
These options are handed through to function optFederov
from package AlgDesign and are documented there.
Atkinson, A.C. and Donev, A.N. (1992). Optimum experimental designs. Clarendon Press, Oxford.
Federov, V.V. (1972). Theory of optimal experiments. Academic Press, New York.
Wheeler, R.E. (2004). Comments on algorithmic design. Vignette accompanying package AlgDesign. ../../AlgDesign/doc/AlgDesign.pdf.
See also optFederov, fac.design,
quad, cubic,
Dopt.augment. Furthermore, unrelated to function Dopt.design,
see also function gen_design from package skpr
for a new general R package for creating D-optimal or other letter optimal designs.
## a full quadratic model with constraint in three quantitative factors
plan <- Dopt.design(36,factor.names=list(eins=c(100,250),zwei=c(10,30),drei=c(-25,25)),
nlevels=c(4,3,6),
formula=~quad(.),
constraint="!(eins>=200 & zwei==30 & drei==25)")
plan
cor(plan)
y <- rnorm(36)
r.plan <- add.response(plan, y)
plan2 <- Dopt.augment(r.plan, m=10)
plot(plan2)
cor(plan2)
## designs with qualitative factors and blocks for
## an experiment on assessing stories of social situations
## where each subject is a block and receives a deck of 5 stories
plan.v <- Dopt.design(480, factor.names=list(cause=c("sick","bad luck","fault"),
consequences=c("alone","children","sick spouse"),
gender=c("Female","Male"),
Age=c("young","medium","old")),
blocks=96,
constraint="!(Age==\"young\" & consequences==\"children\")",
formula=~.+cause:consequences+gender:consequences+Age:cause)
## an experiment on assessing stories of social situations
## with the whole block (=whole plot) factor gender of the assessor
## not run for saving test time on CRAN
if (FALSE) plan.v.splitplot <- Dopt.design(480, factor.names=list(cause=c("sick","bad luck","fault"),
consequences=c("alone","children","sick spouse"),
gender.story=c("Female","Male"),
Age=c("young","medium","old")),
blocks=96,
wholeBlockData=cbind(gender=rep(c("Female","Male"),each=48)),
constraint="!(Age==\"young\" & consequences==\"children\")",
formula=~.+gender+cause:consequences+gender.story:consequences+
gender:consequences+Age:cause+gender:gender.story)
Run the code above in your browser using DataLab