This function perform a K-fold cross validation to select the number of dimensions, R, of the latent space in the Latent Space Item Response Model (LSIRM). Model performance is evaluated using metrics based on the out-of-sample preduction accuracy, the area under the ROC cruve, and the mean squared error.
LSMselect(X, maxDims=3, nfolds=5, penalty=NULL, C=NULL,
starts=NULL, tol=.1, silent=TRUE)An object of class LSMselect with values
The overall metrics (summed over folds)
A list with seperate entries for each metric containing the results for each fold seperately
A matrix of size N by n containing the binary or ordinal item scores, where N is
the number of subjects and n is the number of items. The number of item score categories can be different across items as long as the lowest score is coded 0 for all items. NA's are allowed.
The maximum nuber of dimensions R for the latent space to be considered. Should be at least 1 so that at least R=0 and R=1 are considered.
The nuber of folds K.
The weight for the L2 penalty of pJML. If penalty is NULL (the default), a pJML is used with a weight of 1 (i.e., standard normal prior on all parameters).
The maximum size of the norm of the person parameter vectors. Not available for cross-validation yet.
Either a list containing starting values for the model parameters or a character string inidcating the method of starting value calculation, see LSMfit
Convergence criterion: Iterations stop if the difference in loglikelihoods between two subsequent iterations is smaller than this number. Default is .1.
Logical. If FALSE, iterations details are printed to the screen during estimation.
Dylan Molenaar d.molenaar@uva.nl
LSMselect assigns the non-missing elements of the N by n matrix X randomly to one of the K folds making sure that the folds are (close to) equally sized. Then, maxDims+1 models (i.e., R=0, R=1, ..., R=maxDims) are fit leaving out the data of the first fold. Next, using the parameter esimtates for each of the models, the data in the first fold are predicted. Using these predictions and the actual observations in the fold, the three metrics below are calculated. This scheme is repeated for all folds, so that each fold is held out of estimation once.
The metrics calculated are respectively based on the well known prediction accuracy, area under the ROC curve, and mean squared error. However, the metrics are unnormalized and -for accuracy and the ROC curve- are complements so that for all metrics lower values indicate a better model fit. This results in the following metrics:
The UCE is the number of incorrectly predicted item scores summed over folds
The URE is the complement of the area under the ROC curve multiplied by the fold size and summed over folds
The RSS is the sum of the squared residuals over folds
For more details see Molenaar and Jeon (submitted).
Molenaar, D., & Jeon, M.J. (in press). Joint maximum likelihood estimation of latent space item response models. Psychometrika.
LSMfit for fitting LSIRM models and for details about the model.
# Toy example: compare between R=0 and R=1 for data that follows one dimensional
# latent space model (R=1) using only 2 folds
set.seed(1111)
N=1000
nit=20
ndim_z=1
dat_obj=LSMsim(N,nit,ndim_z)
X=dat_obj$X
LSMselect(X,1,nfolds=2)
Run the code above in your browser using DataLab