tetrachoric(x,correct=TRUE)
polychoric(x,polycor=FALSE, ML = FALSE, std.err=FALSE)
biserial(x,y)
polyserial(x,y)
poly.mat(x, short = TRUE, std.err = FALSE, ML = FALSE) #deprecated use polychoric instead
The data typically will be a raw data matrix of responses to a questionnaire scored either true/false (tetrachoric) or with a limited number of responses (polychoric). In both cases, the marginal frequencies are converted to normal theory thresholds and the resulting table for each item pair is converted to the (inferred) latent Pearson correlation that would produce the observed cell frequencies with the observed marginals. (See draw.tetra
for an illustration.)
The tetrachoric correlation is used in a variety of contexts, one important one being in Item Response Theory (IRT) analyses of test scores, a second in the conversion of comorbity statistics to correlation coefficients. It is in this second context that examples of the sensitivity of the coefficient to the cell frequencies becomes apparent:
Consider the test data set from Kirk (1973) who reports the effectiveness of a ML algorithm for the tetrachoric correlation (see examples).
Examples include the lsat6 and lsat7 data sets in the bock
data.
The polychoric function forms matrices of polychoric correlations by either using John Fox's polychor function or by an local function (polyc) and will also report the tau values for each alternative.
polychoric replaces poly.mat and is recommended. poly.mat is an alternative wrapper to the polycor function.
biserial and polyserial correlations are the inferred latent correlations equivalent to the observed point-biserial and point-polyserial correlations (which are themselves just Pearson correlations).
The polyserial function is meant to work with matrix or dataframe input and treats missing data by finding the pairwise Pearson r corrected by the overall (all observed cases) probability of response frequency. This is particularly useful for SAPA procedures with large amounts of missing data and no complete cases.
Ability tests and personality test matrices will typically have a cleaner structure when using tetrachoric or polychoric correlations than when using the normal Pearson correlation.
A biserial correlation (not to be confused with the point-biserial correlation which is just a Pearson correlation) is the latent correlation between x and y where y is continuous and x is dichotomous but assumed to represent an (unobserved) continuous normal variable. Let p = probability of x level 1, and q = 1 - p. Let zp = the normal ordinate of the z score associated with p. Then, $rbi = r s* \sqrt(pq)/zp$.
The 'ad hoc' polyserial correlation, rps is just $r = r * sqrt(n-1)/n) \sigma y /\sum(zpi)$ where zpi are the ordinates of the normal curve at the normal equivalent of the cut point boundaries between the item responses. (Olsson, 1982)
All of these were inspired by (and adapted from) John Fox's polychor package which should be used for precise ML estimates of the correlations. See, in particular, the hetcor function in the polychor package.
David Kirk (1973) On the numerical approximation of the bivariate normal (tetrachoric) correlation coefficient. Psychometrika, 38, 259-268.
U.Olsson, F.Drasgow, and N.Dorans (1982). The polyserial correlation coefficient. Psychometrika, 47:337-347.
irt.fa
uses the tetrachoric function to do item analysis with the fa
factor analysis function.
draw.tetra
shows the logic behind a tetrachoric correlation (for teaching purpuses.)if(require(mvtnorm)) {
data(bock)
tetrachoric(lsat6)
polychoric(lsat6) #values should be the same
tetrachoric(matrix(c(44268,193,14,0),2,2)) #MPLUS reports.24
tetrachoric(matrix(c(44268,193,14,0),2,2),FALSE) #Do not apply continuity correction -- compare with previous analysis!
tetrachoric(matrix(c(61661,1610,85,20),2,2)) #Mplus reports .35
tetrachoric(matrix(c(62503,105,768,0),2,2)) #Mplus reports -.10
tetrachoric(matrix(c(24875,265,47,0),2,2)) #Mplus reports 0
tetrachoric(matrix(c(24875,265,47,0),2,2),FALSE) #Do not apply continuity correction- compare with previous analysis
tetrachoric(c(0.02275000, 0.0227501320, 0.500000000))
tetrachoric(c(0.0227501320, 0.0227501320, 0.500000000)) } else {message("Sorry, you must have mvtnorm installed")}
# 4 plots comparing biserial to
set.seed(42)
x.4 <- sim.congeneric(loads =c(.9,.6,.3,0),N=1000,short=FALSE)
y <- x.4$latent[,1]
for(i in 1:4) {
x <- x.4$observed[,i]
r <- round(cor(x,y),1)
ylow <- y[x<= 0]
yhigh <- y[x > 0]
yc <- c(ylow,yhigh)
rpb <- round(cor((x>=0),y),2)
rbis <- round(biserial(y,(x>=0)),2)
ellipses(x,y,ylim=c(-3,3),xlim=c(-4,3),pch=21 - (x>0),main =paste("r = ",r,"rpb = ",rpb,"rbis =",rbis))
dlow <- density(ylow)
dhigh <- density(yhigh)
points(dlow$y*5-4,dlow$x,typ="l",lty="dashed")
lines(dhigh$y*5-4,dhigh$x,typ="l")
}
Run the code above in your browser using DataLab