Let \(X \in \mathbb{R}^{n \times p}\) be a
numeric matrix with rows as observations and columns as variables, and let
\(\mathbf{1} \in \mathbb{R}^n\) denote the all-ones vector. Define the column
means \(\mu = (1/n)\,\mathbf{1}^\top X\) and the centred cross-product
matrix
$$ S \;=\; (X - \mathbf{1}\mu)^\top (X - \mathbf{1}\mu)
\;=\; X^\top \!\Big(I_n - \tfrac{1}{n}\mathbf{1}\mathbf{1}^\top\Big) X
\;=\; X^\top X \;-\; n\,\mu\,\mu^\top. $$
The (unbiased) sample covariance is
$$ \widehat{\Sigma} \;=\; \tfrac{1}{n-1}\,S, $$
and the sample standard deviations are \(s_i = \sqrt{\widehat{\Sigma}_{ii}}\).
The Pearson correlation matrix is obtained by standardising \(\widehat{\Sigma}\), and it is given by
$$ R \;=\; D^{-1/2}\,\widehat{\Sigma}\,D^{-1/2}, \qquad
D \;=\; \mathrm{diag}(\widehat{\Sigma}_{11},\ldots,\widehat{\Sigma}_{pp}), $$
equivalently, entrywise \(R_{ij} = \widehat{\Sigma}_{ij}/(s_i s_j)\) for
\(i \neq j\) and \(R_{ii} = 1\). With \(1/(n-1)\) scaling,
\(\widehat{\Sigma}\) is unbiased for the covariance; the induced
correlations are biased in finite samples.
The implementation forms \(X^\top X\) via a BLAS
symmetric rank-\(k\) update (SYRK) on the upper triangle, then applies the
rank-1 correction \(-\,n\,\mu\,\mu^\top\) to obtain \(S\) without
explicitly materialising \(X - \mathbf{1}\mu\). After scaling by
\(1/(n-1)\), triangular normalisation by \(D^{-1/2}\) yields \(R\),
which is then symmetrised to remove round-off asymmetry. Tiny negative values
on the covariance diagonal due to floating-point rounding are truncated to
zero before taking square roots.
If a variable has zero variance (\(s_i = 0\)), the
corresponding row and column of \(R\) are set to NA. When
check_na = FALSE, each \((i,j)\) correlation is recomputed on the
pairwise complete-case overlap of columns \(i\) and \(j\).
When ci = TRUE, Fisher-\(z\) confidence intervals are computed from
the observed pairwise Pearson correlation \(r_{ij}\) and the pairwise
complete-case sample size \(n_{ij}\):
$$
z_{ij} = \operatorname{atanh}(r_{ij}), \qquad
\operatorname{SE}(z_{ij}) = \frac{1}{\sqrt{n_{ij} - 3}}.
$$
With \(z_{1-\alpha/2} = \Phi^{-1}(1 - \alpha/2)\), the confidence limits are
$$
\tanh\!\bigl(z_{ij} - z_{1-\alpha/2}\operatorname{SE}(z_{ij})\bigr)
\;\;\text{and}\;\;
\tanh\!\bigl(z_{ij} + z_{1-\alpha/2}\operatorname{SE}(z_{ij})\bigr).
$$
Confidence intervals are reported only when \(n_{ij} > 3\).
Computational complexity. The dominant cost is \(O(n p^2)\) flops
with \(O(p^2)\) memory.