tclustregIC: Computes `tclustreg` for different number of groups `k` and restriction factors `c`.

Description

(the last two letters stand for 'Information Criterion') computes the values of BIC (MIXMIX), ICL (MIXCLA) or CLA (CLACLA), for different values of k (number of groups) and different values of c (restriction factor for the variances of the residuals), for a prespecified level of trimming. In order to minimize randomness, given k, the same subsets are used for each value of c.

Usage

tclustregIC(
  y,
  x,
  alphaLik,
  alphaX,
  intercept = TRUE,
  plot = FALSE,
  nsamp,
  refsteps = 10,
  reftol = 1e-13,
  equalweights = FALSE,
  wtrim = 0,
  we,
  msg = TRUE,
  RandNumbForNini,
  trace = FALSE,
  ...
)

Value

An S3 object of class tclustreg.object

Arguments

y

Response variable. A vector with n elements that contains the response variable.

x

An n x p data matrix (n observations and p variables). Rows of x represent observations, and columns represent variables.

Missing values (NA's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

alphaLik

Trimming level, a scalar between 0 and 0.5 or an integer specifying the number of observations which have to be trimmed. If alphaLik=0, there is no trimming. More in detail, if 0 < alphaLik < 1 clustering is based on h = floor(n * (1 - alphaLik)) observations. If alphaLik is an integer greater than 1 clustering is based on h = n - floor(alphaLik). More in detail, likelihood contributions are sorted and the units associated with the smallest n - h contributions are trimmed.

alphaX

Second-level trimming or constrained weighted model for x.

intercept

wheather to use constant term (default is intercept=TRUE

plot

If plot=FALSE (default) or plot=0 no plot is produced. If plot=TRUE a plot with the final allocation is shown (using the spmplot function). If X is 2-dimensional, the lines associated to the groups are shown too.

nsamp

If a scalar, it contains the number of subsamples which will be extracted. If nsamp = 0 all subsets will be extracted. Remark - if the number of all possible subset is greater than 300 the default is to extract all subsets, otherwise just 300. If nsamp is a matrix it contains in the rows the indexes of the subsets which have to be extracted. nsamp in this case can be conveniently generated by function subsets(). nsamp must have k * p columns. The first p columns are used to estimate the regression coefficient of group 1, ..., the last p columns are used to estimate the regression coefficient of group k.

refsteps

Number of refining iterations in each subsample. Default is refsteps=10. refsteps = 0 means "raw-subsampling" without iterations.

reftol

Tolerance of the refining steps. The default value is 1e-14

equalweights

A logical specifying wheather cluster weights in the concentration and assignment steps shall be considered. If equalweights=TRUE we are (ideally) assuming equally sized groups, else if equalweights = false (default) we allow for different group weights. Please, check in the given references which functions are maximized in both cases.

wtrim

How to apply the weights on the observations - a flag taking values in c(0, 1, 2, 3, 4).

If wtrim==0 (no weights), the algorithm reduces to the standard tclustreg algorithm.
If wtrim==1, trimming is done by weighting the observations using values specified in vector we. In this case, vector we must be supplied by the user.
If wtrim==2, trimming is again done by weighting the observations using values specified in vector we. In this case, vector we is computed from the data as a function of the density estimate pdfe. Specifically, the weight of each observation is the probability of retaining the observation, computed as $$pretain_{ig} = 1-pdfe_{ig}/max_{ig}(pdfe_{ig})$$
If wtrim==3, trimming is again done by weighting the observations using values specified in vector we. In this case, each element wei of vector we is a Bernoulli random variable with probability of success $pdfe_{ig}$. In the clustering framework this is done under the constraint that no group is empty.
If wtrim==4, trimming is done with the tandem approach of Cerioli and Perrotta (2014).

we

Weights. A vector of size n-by-1 containing application-specific weights Default is a vector of ones.

msg

Controls whether to display or not messages on the screen If msg==TRUE (default) messages are displayed on the screen. If msg=2, detailed messages are displayed, for example the information at iteration level.

RandNumbForNini

pre-extracted random numbers to initialize proportions. Matrix of size k-by-nrow(nsamp) containing the random numbers which are used to initialize the proportions of the groups. This option is effective only if nsamp is a matrix which contains pre-extracted subsamples. The purpose of this option is to enable the user to replicate the results when the function tclustreg() is called using a parfor instruction (as it happens for example in routine IC, where tclustreg() is called through a parfor for different values of the restriction factor). The default is that RandNumbForNini is empty - then uniform random numbers are used.

trace

Whether to print intermediate results. Default is trace=FALSE.

...

potential further arguments passed to lower level functions.

Author

FSDA team, valentin.todorov@chello.at

References

Torti F., Perrotta D., Riani, M. and Cerioli A. (2019). Assessing Robust Methodologies for Clustering Linear Regression Data, Advances in Data Analysis and Classification, Vol. 13, pp 227-257.