Computes the point-biserial correlation between a dichotomous and a continuous variable.
biserial.cor(x, y, use = c("all.obs", "complete.obs"), level = 1)
a numeric vector representing the continuous variable.
a factor or a numeric vector (that will be converted to a factor) representing the dichotomous variable.
If use
is "all.obs", then the presence of missing observations will produce an error. If use
is "complete.obs" then missing values are handled by casewise deletion.
which level of y
to use.
the (numeric) value of the point-biserial correlation.
The point biserial correlation computed by biserial.cor()
is defined as follows $$r =
\frac{(\overline{X}_1 - \overline{X}_0)\sqrt{\pi (1 - \pi)}}{S_x},$$
where \(\overline{X}_1\) and \(\overline{X}_0\) denote the sample means of the \(X\)-values
corresponding to the first and second level of \(Y\), respectively, \(S_x\) is the sample standard deviation of
\(X\), and \(\pi\) is the sample proportion for \(Y = 1\). The first level of \(Y\) is defined by the
level
argument; see Examples.
# NOT RUN { # the point-biserial correlation between # the total score and the first item, using # '0' as the reference level biserial.cor(rowSums(LSAT), LSAT[[1]]) # and using '1' as the reference level biserial.cor(rowSums(LSAT), LSAT[[1]], level = 2) # }
Run the code above in your browser using DataCamp Workspace