Takes one or more sets of "MEDseq"
models fitted by MEDseq_fit
and ranks them according to a specified model selection criterion. It's possible to respect the internal ranking within each set of models, or to discard models within each set which were already deemed sub-optimal. This function can help with model selection via exhaustive or stepwise searches.
MEDseq_compare(...,
criterion = c("bic", "icl", "aic",
"dbs", "asw", "cv", "nec"),
pick = 10L,
optimal.only = FALSE)# S3 method for MEDseqCompare
print(x,
index = seq_len(x$pick),
rerank = FALSE,
digits = 3L,
maxi = length(index),
...)
A list of class "MEDseqCompare"
, for which a dedicated print function exists, containing the following elements (each of length pick
, and ranked according to criterion
, where appropriate):
data
The name of the data set to which the models were fitted.
optimal
The single optimal model (an object of class "MEDseq"
) among those supplied, according to the chosen criterion
.
pick
The final number of ranked models. May be different (i.e. less than) the supplied pick
value.
MEDNames
The names of the supplied "MEDseq"
objects.
modelNames
The MEDseq model names (denoting the constraints or lack thereof on the precision parameters).
G
The optimal numbers of components.
df
The numbers of estimated parameters.
iters
The numbers of EM/CEM iterations.
bic
BIC values, ranked according to criterion
(not necessarily "bic"
).
icl
ICL values, ranked according to criterion
(not necessarily "icl"
).
aic
AIC values, ranked according to criterion
(not necessarily "aic"
).
dbs
(Weighted) mean/median DBS values, ranked according to criterion
(not necessarily "dbs"
).
asw
(Weighted) mean/median ASW values, ranked according to criterion
(not necessarily "asw"
).
cv
Cross-validated log-likelihood values, ranked according to criterion
(not necessarily "cv"
).
nec
NEC values, ranked according to criterion
(not necessarily "nec"
).
loglik
Maximal log-likelihood values.
gating
The gating formulas.
algo
The algorithm used for fitting the model - either "EM"
, "CEM"
, "cemEM"
.
equalPro
Logical indicating whether mixing proportions were constrained to be equal across components.
opti
The method used for estimating the central sequence(s).
weights
Logical indicating whether the given model was fitted with sampling weights.
noise
Logical indicating the presence/absence of a noise component. Only displayed if at least one of the compared models has a noise component.
noise.gate
Logical indicating whether gating covariates were allowed to influence the noise component's mixing proportion. Only printed for models with a noise component, when at least one of the compared models has gating covariates.
equalNoise
Logical indicating whether the mixing proportion of the noise component for equalPro
models is also equal (TRUE
) or estimated (FALSE
).
One or more objects of class "MEDseq"
outputted by MEDseq_fit
. All models must have been fit to the same data set. A single named list of such objects can also be supplied. Additionally, objects of class "MEDseqCompare"
outputted by this very function can also be supplied here.
This argument is only relevant for the MEDseq_compare
function and will be ignored for the associated print
function.
The criterion used to determine the ranking. Defaults to "bic"
.
The (integer) number of models to be ranked and compared. Defaults to 10L
. Will be constrained by the number of models within the "MEDseq"
objects supplied via ...
if optimal.only
is FALSE
, otherwise constrained simply by the number of "MEDseq"
objects supplied. Setting pick=Inf
is a valid way to select all models.
Logical indicating whether to only rank models already deemed optimal within each "MEDeq"
object (TRUE
), or to allow models which were deemed suboptimal enter the final ranking (FALSE
, the default). See Details
.
Arguments required for the associated print
function:
x
An object of class "MEDseqCompare"
resulting from a call to MEDseq_compare
.
index
A logical or numeric vector giving the indices of the rows of the table of ranked models to print. This defaults to the full set of ranked models. It can be useful when the table of ranked models is large to examine a subset via this index
argument, for display purposes. See rerank
.
rerank
A logical indicating whether the ranks should be recomputed when subsetting using index
. Defaults to FALSE
.
digits
The number of decimal places to round model selection criteria to (defaults to 3
).
maxi
A number specifying the maximum number of rows/models to print. Defaults to length(index)
.
Keefe Murphy - <keefe.murphy@mu.ie>
The purpose of this function is to conduct model selection on "MEDseq"
objects, fit to the same data set, with different combinations of gating network covariates or different initialisation settings.
Model selection will have already been performed in terms of choosing the optimal number of components and MEDseq model type within each supplied set of results, but MEDseq_compare
will respect the internal ranking of models when producing the final ranking if optimal.only
is FALSE
: otherwise only those models already deemed optimal within each "MEDseq"
object will be ranked.
As such if two sets of results are supplied when optimal.only
is FALSE
, the 1st, 2nd, and 3rd best models could all belong to the first set of results, meaning a model deemed suboptimal according to one set of covariates could be superior to one deemed optimal under another set of covariates.
Murphy, K., Murphy, T. B., Piccarreta, R., and Gormley, I. C. (2021). Clustering longitudinal life-course sequences using mixtures of exponential-distance models. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184(4): 1414-1451. <tools:::Rd_expr_doi("10.1111/rssa.12712")>.
MEDseq_fit
, plot.MEDseq
data(biofam)
seqs <- seqdef(biofam[10:25] + 1L,
states = c("P", "L", "M", "L+M", "C",
"L+C", "L+M+C", "D"))
covs <- cbind(biofam[2:3], age=2002 - biofam$birthyr)
# \donttest{
# Fit a range of models
# m1 <- MEDseq_fit(seqs, G=9:10)
# m2 <- MEDseq_fit(seqs, G=9:10, gating=~sex, covars=covs, noise.gate=FALSE)
# m3 <- MEDseq_fit(seqs, G=9:10, gating=~age, covars=covs, noise.gate=FALSE)
# m4 <- MEDseq_fit(seqs, G=9:10, gating=~sex + age, covars=covs, noise.gate=FALSE)
# Rank only the optimal models (according to the dbs criterion)
# Examine the best model in more detail
# (comp <- MEDseq_compare(m1, m2, m3, m4, criterion="dbs", optimal.only=TRUE))
# (best <- comp$optimal)
# (summ <- summary(best, parameters=TRUE))
# Examine all models visited, including those already deemed suboptimal
# Only print models with gating covariates & 10 components
# comp2 <- MEDseq_compare(comp, m1, m2, m3, m4, criterion="dbs", pick=Inf)
# print(comp2, index=comp2$gating != "None" & comp2$G == 10)# }
Run the code above in your browser using DataLab