bootcomp: Perform a bootstrap test for the number of components in a mixture of regressions.

Description

Produces nboot bootstrap realizations of the likelihood ratio statistic, either parametrically or semi-parametrically, and calculates the corresponding p-value of the test.

Usage

bootcomp(x, y, ncomp=2, ncincr=1, intercept=TRUE, nboot=1000,
         ts1=NULL, ts2=NULL, sem.par=FALSE, verb=FALSE,
         print.prog=TRUE, …)

Arguments

A matrix of predictors for each of the regression models in the mixture. It should NOT include an initial column of 1s. If there is only one predictor, x may be a vector.

The vector of responses for the regression models.

ncomp

The null-hypothesized number of components in the mixture.

ncincr

The increment from the null-hypothesized number of components in the mixture to the number under the alternative hypothesis; i.e. the number of components under the alternative hypothesis is ncomp + ncincr.

intercept

Logical argument indicating whether the regression models in the mixture should have intercept terms.

nboot

The number of bootstrap replicates of the log likelihood ratio statistic to be produced.

ts1

Starting values for fitting the ncomp component model. If ts1 is null, random starting values are used. (This is not recommended.)

ts2

Starting values for fitting the ncomp+nincr component model. If ts2 is null, random starting values are used. (This is not recommended.)

sem.par

Logical argument indicating whether semi-parametric bootstrapping should be used.

verb

Logical argument indicating whether the fitting processes should be verbose (i.e. whether details should be printed out at each step of the EM algorithm). If TRUE a huge amount of screen output is produced.

print.prog

Logical argument indicating whether the index of the bootstrap replicate just completed should be printed out, to give an idea of how the process is progressing.

...

Further arguments to be passed to mixreg to control the fitting procedure.

Value

A list (of class "mixreg") with components

lrs

The log likelihood ratio statistic for testing that the number of components is ncomp versus that it is ncomp + nincr.

aic.ncomp

The vector (with dimension nboot) of Akaike Information Criterion values for each of the fitted ncomp component models fitted to bootstrap data sets. The value of ncomp is substituted in the name; e.g. if ncomp = 2 then the name of this component of the returned list is "aic.2".

aic.ncomp+ncincr

The vector (with dimension nboot) of Akaike Information Criterion values for each of the fitted ncomp+ncincr component models fitted to bootstrap data sets. The value of ncomp+ncincr is substituted in the name; e.g. if ncomp = 2 and ncrinc=1, then the name of this component of the returned list is "aic.3".

pval.boot

The p-value of the hypothesis test from the bootstrapping procedure. It is calculated as sum(lrs <= lrs.boot)/nboot.

lrs.boot

The vector of bootstrap replicates of the log likelihood ratio statistic

screw.ups

A list giving information about the screw-ups that have occured in the bootstrapping procedure; it includes the values of .Random.seed that lead to the data causing the screw-up so that the difficulty may be re-produced and examined if so desired. See the comments in the code for the meaning of the various ``types'' of screw-up. The "times" component of the screw.ups list gives the index of the bootstrap replicate that was being worked on when the screw-up occured. Note that if a screw-up does occur, the replicate is redone completely.

Details

In parametic bootstrapping the bootstrap data sets are generated by simulating from the fitted ncomp model parameters, using Gaussian errors. In semi-parametric bootstrapping the errors are generated by resampling from the residuals. Since at each predictor vector there are ncomp residuals, one for each component of the model, the errors are selected from these ncomp possibilities. The selection probabilities at this step are the conditional probabilities, of the observation being generated by each component of the model, given that observation. These probabilities depend on the parameters of the model whence the procedure is semi-parametric.

References

Turner, T. R. (2000) Estimating the rate of spread of a viral infection of potato plants via mixtures of regressions. Appl. Statist. vol. 49, Part 3, pp. 371 -- 384.

Examples

Run this code

# NOT RUN {
TS1 <- list(list(beta=c(3.0,0.1),sigsq=16,lambda=0.5),
            list(beta=c(0.0,0.0),sigsq=16,lambda=0.5))
TS2 <- list(list(beta=c(3.0,0.1),sigsq=9,lambda=1/3),
            list(beta=c(1.5,0.05),sigsq=9,lambda=1/3),
            list(beta=c(0.0,0.0),sigsq=9,lambda=1/3))
data(aphids)
x <- aphids$n.aphids
y <- aphids$n.inf
# }
# NOT RUN {
  nboot <- 1000
# }
# NOT RUN {
boot.23 <- bootcomp(x,y,nboot=nboot,ts1=TS1,ts2=TS2)
# }

Run the code above in your browser using DataLab