Uses an alternating nonnegative least squares algorithm combined with a k-means-type algorithm to optimize the constrained group dual scaling criterion outlined in the reference. Parallel computations for random starts of the grouping matrix is supported via package parallel.
cds(x, K = 4, q = NULL, eps.ALS = 0.001, eps.G = 1e-07,
nr.starts.G = 20, nr.starts.a = 5, maxit.ALS = 20, maxit = 50,
Gstarts = NULL, astarts = NULL, parallel = FALSE, random.G = FALSE,
times.a.multistart = 1, info.level = 1, mc.preschedule = TRUE,
seed = NULL, LB = FALSE, reorder.grps = TRUE, rescale.a = TRUE,
tol = sqrt(.Machine$double.eps), update.G = TRUE)
Object of class ds
with elements:
Grouping indicator matrix.
Number of groups K.
Optimum value of the criterion.
The 2n-vector of row scores.
The m-vector of object scores.
The matrix of group-specific boundary scores for the ratings.
The estimated spline coefficients for each group.
The number of iterations used for the optimal random start wrt the grouping matrix.
The number of seconds it took for the algorithm to converge for this optimal random start.
The grouping of the individuals as obtained by the algorithm.
Loss value from G update (not equivalent to that of ALS updates).
Confusion and hitrates of original data object contained a grouping vector.
Optimality criterion values for the random starts of G.
The number of ratings in the
Likert scale 1:q
Total time taken for the algorithm over all random starts
The function call.
The input data object.
an object of class "dsdata"
(see cds.sim()
),
or a matrix (or object coercible to a matrix) containing the data for n
individuals on m objects. The data does not yet contain any additional
columns for the rating scale.
The number of response style groups to look for. If a vector of
length greater than one is given, the algorithm is run for each element
and a list of class cdslist
is returned.
The maximum rating (the scale is assumed to be 1:q
).
Numerical convergence criterion for the alternating least squares part of the algorithm (updates for row and column scores).
Numerical convergence criterion for the k-means part of the algorithm.
Number of random starts for the grouping matrix.
Number of random starts for the row scores.
Maximum number of iterations for the ALS part of the algorithm. A warning is given if this maximum is reached. Often it is not a concern if this maximum is reached.
Maximum number of iterations for the k-means part of the algorithm.
Facility to supply a list of explicit starting values for the
grouping matrix G. Each start consists of a two element list: i
giving
and integer number the start, and G
giving the starting configuration
as an indicator matrix.
Supply explicit starts for the a vectors, as a list.
logical. Should parallelization over starts for the grouping matrix be used?
logical. Should the k-means part consider the individuals in a random order?
The number of times that random starts for the row scores are used. If == 1, then random starts are only used once for each start of the grouping matrix.
Verbosity of the output. Options are 1, 2, 3 and 4.
Argument to mclapply under Unix.
Random seed for random number generators. Only partially implemented.
logical. Load-balancing used in parallelization or not? Windows only.
logical. Use the Hungarian algorithm to reorder group names so that the trace of the confusion matrix is maximized.
logical. Rescale row score to length sqrt(2n) if TRUE (after the algorithm has converged).
tolerance tol
passed to lsei
of the
limSolve package. Defaults to sqrt(.Machine$double.eps)
Logical indicating whether or not to update the G matrix from its starting configuration. Useful when clustering is known apriori or not desired.
Pieter C. Schoonees
See the reference for more details.
Schoonees, P.C., Velden, M. van de & Groenen, P.J.F. (2013). Constrained Dual Scaling for Detecting Response Styles in Categorical Data. (EI report series EI 2013-10). Rotterdam: Econometric Institute.
set.seed(1234)
dat <- cds.sim()
out <- cds(dat)
Run the code above in your browser using DataLab