PST (version 0.84.1)

tune: AIC, AICc or BIC based model selection

Description

Prune a probabilistic suffix tree with a series of cut-offs and select the model having the lowest value of the selected information criterion. Available information criterion are Akaike information criterion (AIC), AIC with a correction for finite sample sizes (AICc) and Bayesian information criterion (BIC).

Usage

## S3 method for class 'PSTf':
tune(object, gain="G2", C, criterion = "AIC", output = "PST")

Arguments

object
a probabilistic suffix tree, i.e., an object of class "PSTf" as returned by the pstree, prune or
gain
character. The gain function used for pruning decisions. See prune for details.
C
numeric. A vector of cutoff values. See prune for details.
criterion
The criterion used to select the model, either AIC, AICc or BIC. AICc should be used when the ratio between the number of observations and the number of estimated parameters is low, which is often the case with VLMC models. Burnham et al, 2004
output
If output='PST' the PST (an object of class "PSTr") having the lowest AIC, AICc or BIC value. If output='stats', a table with the statistics for each model obtained by pruning object<

Value

  • If output="PST" a PST that is an object of class PSTf. If output="stats" a matrix with the results of the tuning procedure. The selected model is tagged with ***, while models with $IC < min(IC)+2$ are tagged with **, and models with $IC < min(IC)+10$ are tagged with **.

Details

The tune function selects among a series of PST pruned with different values of the $C$ cutoff the model having the lowest $AIC$ or $AIC_{c}$ value. The function can return either the selected PST or a data frame containing the statistics for each model.

References

Burnham, K. P. & Anderson, D. R. (2004) Multimodel Inference Sociological Methods & Research, 33, pp. 261-304

See Also

prune

Examples

Run this code
## activity calendar for year 2000
## from the Swiss Household Panel
## see ?actcal
data(actcal)

## selecting individuals aged 20 to 59
actcal <- actcal[actcal$age00>=20 & actcal$age00 <60,]

## defining a sequence object
actcal.lab <- c("> 37 hours", "19-36 hours", "1-18 hours", "no work")
actcal.seq <- seqdef(actcal,13:24,labels=actcal.lab)

## building a PST
actcal.pst <- pstree(actcal.seq, nmin=2, ymin=0.001)

## Cut-offs for 5% and 1% (see ?prune)
C95 <- qchisq(0.95,4-1)/2
C99 <- qchisq(0.99,4-1)/2

## selecting the optimal PST using AIC criterion
actcal.pst.opt <- tune(actcal.pst, gain="G2", C=c(C95,C99))

## plotting the tree
plot(actcal.pst.opt)

Run the code above in your browser using DataLab