Computes the point-biserial correlation between a dichotomous and a continuous variable.

`biserial.cor(x, y, use = c("all.obs", "complete.obs"), level = 1)`

x

a numeric vector representing the continuous variable.

y

a factor or a numeric vector (that will be converted to a factor) representing the dichotomous variable.

use

If `use`

is "all.obs", then the presence of missing observations will produce an error. If `use`

is "complete.obs" then missing values are handled by casewise deletion.

level

which level of `y`

to use.

the (numeric) value of the point-biserial correlation.

The point biserial correlation computed by `biserial.cor()`

is defined as follows $$r =
\frac{(\overline{X}_1 - \overline{X}_0)\sqrt{\pi (1 - \pi)}}{S_x},$$
where \(\overline{X}_1\) and \(\overline{X}_0\) denote the sample means of the \(X\)-values
corresponding to the first and second level of \(Y\), respectively, \(S_x\) is the sample standard deviation of
\(X\), and \(\pi\) is the sample proportion for \(Y = 1\). The first level of \(Y\) is defined by the
`level`

argument; see **Examples**.

# NOT RUN { # the point-biserial correlation between # the total score and the first item, using # '0' as the reference level biserial.cor(rowSums(LSAT), LSAT[[1]]) # and using '1' as the reference level biserial.cor(rowSums(LSAT), LSAT[[1]], level = 2) # }

Run the code above in your browser using DataCamp Workspace