LSMselect: Selecting the Latent Space Dimensionality using K-fold Cross-Validation

Description

This function perform a K-fold cross validation to select the number of dimensions, R, of the latent space in the Latent Space Item Response Model (LSIRM). Model performance is evaluated using metrics based on the out-of-sample preduction accuracy, the area under the ROC cruve, and the mean squared error.

Usage

LSMselect(X, maxDims=3, nfolds=5, penalty=NULL, C=NULL,
          starts=NULL, tol=.1, silent=TRUE)

Value

An object of class LSMselect with values

tot_metrics: The overall metrics (summed over folds)
fold_metrics: A list with seperate entries for each metric containing the results for each fold seperately

Arguments

X: A matrix of size N by n containing the binary or ordinal item scores, where N is the number of subjects and n is the number of items. The number of item score categories can be different across items as long as the lowest score is coded 0 for all items. NA's are allowed.
maxDims: The maximum nuber of dimensions R for the latent space to be considered. Should be at least 1 so that at least R=0 and R=1 are considered.
nfolds: The nuber of folds K.
penalty: The weight for the L2 penalty of pJML. If penalty is NULL (the default), a pJML is used with a weight of 1 (i.e., standard normal prior on all parameters).
C: The maximum size of the norm of the person parameter vectors. Not available for cross-validation yet.
starts: Either a list containing starting values for the model parameters or a character string inidcating the method of starting value calculation, see LSMfit
tol: Convergence criterion: Iterations stop if the difference in loglikelihoods between two subsequent iterations is smaller than this number. Default is .1.
silent: Logical. If FALSE, iterations details are printed to the screen during estimation.

Author

Dylan Molenaar d.molenaar@uva.nl

Details

LSMselect assigns the non-missing elements of the N by n matrix X randomly to one of the K folds making sure that the folds are (close to) equally sized. Then, maxDims+1 models (i.e., R=0, R=1, ..., R=maxDims) are fit leaving out the data of the first fold. Next, using the parameter esimtates for each of the models, the data in the first fold are predicted. Using these predictions and the actual observations in the fold, the three metrics below are calculated. This scheme is repeated for all folds, so that each fold is held out of estimation once.

The metrics calculated are respectively based on the well known prediction accuracy, area under the ROC curve, and mean squared error. However, the metrics are unnormalized and -for accuracy and the ROC curve- are complements so that for all metrics lower values indicate a better model fit. This results in the following metrics:

Unnormalized Classification Error (UCE): The UCE is the number of incorrectly predicted item scores summed over folds
Unnormalized ROC Error (URE): The URE is the complement of the area under the ROC curve multiplied by the fold size and summed over folds
Residual Sum of Squars (RSS): The RSS is the sum of the squared residuals over folds

References

Molenaar, D., & Jeon, M.J. (in press). Joint maximum likelihood estimation of latent space item response models. Psychometrika.

Examples

Run this code


 # Toy example: compare between R=0 and R=1 for data that follows one dimensional
 # latent space model (R=1) using only 2 folds

 set.seed(1111)
 N=1000
 nit=20
 ndim_z=1
 dat_obj=LSMsim(N,nit,ndim_z)
 X=dat_obj$X
 LSMselect(X,1,nfolds=2)

Run the code above in your browser using DataLab