This package includes score matching estimators for particular distributions and a general capacity to implement additional score matching estimators. Score matching is a popular estimation technique when normalising constants for the proposed model are difficult to calculate or compute. Score matching was first developed by hyvarinen2005es;textualscorematchingad and was further developed for subsets of Euclidean space hyvarinen2007ex,yu2019ge,yu2020ge,liu2019esscorematchingad, Riemannian manifolds mardia2016sc,mardia2018nescorematchingad, and Riemannian manifolds with boundary scealy2023scscorematchingad. In this help entry we briefly describe score matching estimation.
In the most general form (Riemannian manifolds with boundary) score matching minimises the weighted Hyvärinen divergence @Equation 7, @scealy2023scscorematchingad $$ \phi(f, f_0) = \frac{1}{2} \int_M f_0(z)h(z)^2 \left\lVert P(z)\Big(\nabla_z \log(f) - \nabla_z \log(f_0)\Big)\right\rVert^2 dM(z), $$ where
\(M\) is the manifold, isometrically embedded in Euclidean space, and \(dM(z)\) is the unnormalised uniform measure on \(M\).
\(P(z)\) is the matrix that projects points onto the tangent space of the manifold at \(z\), which is closely related to to Riemannian metric of \(M\).
\(f_0\) is the density of the data-generating process, defined with respect to \(dM(z)\).
\(f\) is the density of a posited model, again defined with respect to \(dM(z)\).
\(h(z)\) is a function, termed the boundary weight function, that is zero on the boundary of \(M\) and smooth @Section 3.2, @scealy2023scscorematchingad or potentially piecewise smooth.
\(\nabla_z\) is the Euclidean gradient operator that differentiates with respect to \(z\).
\(\lVert \cdot \rVert\) is the Euclidean norm.
Note that, because \(P(z)\) is the projection matrix, \(\left\lVert P(z)\Big(\nabla_z \log(f) - \nabla_z \log(f_0)\Big)\right\rVert^2\) is the natural inner product of the gradient of the log ratio of \(f\) and \(f_0\).
When the density functions \(f\) and \(f_0\) are smooth and positive inside \(M\), and the boundary weight function is smooth or of particular forms for specific manifolds @Section 3.2, @scealy2023scscorematchingad, then minimising the weighted Hyvärinen divergence \(\phi(f, f_0)\) is equivalent to minimising the score matching discrepancy @Theorem 1, @scealy2023scscorematchingad $$ \psi(f, f_0) = \int f_0(z)\big(A(z) + B(z) + C(z)\big)dM(z), $$ where $$A(z) = \frac{1}{2} h(z)^2 \left(\nabla_z \log(f)\right)^T P(z) \left(\nabla_z \log(f)\right),$$ $$B(z) = h(z)^2\Delta_z\log(f),$$ $$C(z) = \left(\nabla_z h(z)^2)\right)^T P(z) \left(\nabla_z \log(f)\right),$$ and \(\Delta\) is the Laplacian operator. We term $$A(z) + B(z) + C(z)$$ the score matching discrepancy function.
We suspect that @Theorem 1, @scealy2023scscorematchingad holds more generally for nearly all realistic continuous and piecewise-smooth boundary weight functions, although no proof exists to our knowledge.
When \(n\) independent observations from \(f_0\) are available, the integration in \(\psi(f, f_0)\) can be approximated by an average over the observations, $$\psi(f, f_0) \approx \hat\psi(f, f_0) = \frac{1}{n} \sum_{i = 1}^n A(z_i) + B(z_i) + C(z_i).$$
If we parameterise a family of models \(f_\theta\) according to a vector of parameters \(\theta\), then the score matching estimate is the \(\theta\) that minimises \(\hat\psi(f_\theta, f_0)\).
In general, the score matching estimate must be found via numerical optimisation techniques, such as in the function cppad_search()
.
However, when the family of models is a canonical exponential family then often \(\hat\psi(f_\theta, f_0)\) and the score matching discrepancy function is a quadratic function of \(\theta\) mardia2018nescorematchingad and the minimum has a closed-form solution found by cppad_closed()
.
Note that when \(M\) has a few or more dimensions, the calculations of \(A(z)\), \(B(z)\) and \(C(z)\) can become cumbersome. This package uses CppAD
to automatically compute \(A(z)\), \(B(z)\) and \(C(z)\), and the quadratic simplification if it exists.
Hyvärinen divergence \(\phi(f, f_0)\) is sensitive to transformations of the measure \(dM(z)\), including transforming the manifold. That is, transforming the manifold \(M\) changes the divergence between distributions and changes the minimum of \(\hat\psi(f_\theta, f_0)\). The transformation changes measure \(dM(z)\), the divergence and the estimator but does not transform the data.
For example, many different transformations of the simplex (i.e. compositional data) are possible @Appendix A.3, @scealy2024roscorematchingad. Hyvärinen divergences that use the sphere, obtained from the simplex by a square root, have different behaviour to Hyvärinen divergence using Euclidean space obtained from the simplex using logarithms scealy2024roscorematchingad. The estimator for the latter does not apply logarithms to the observations, in fact the estimator involves only polynomials of the observed compositions scealy2024roscorematchingad.
The variety of estimator behaviour available through different transformations was a major motivator for this package as each transformation has different \(A(z)\), \(B(z)\) and \(C(z)\), and without automatic differentiation, implementation of the score matching estimator in each case would require a huge programming effort.