For a parametric model family, the function buildsmdtape()
generates CppAD
tapes (called ADFun
s) for the improper log-likelihood (without normalising constant) of the family and the score matching discrepancy function \(A(z) + B(z) + C(z)\) (defined in scorematchingtheory
).
Three steps are performed by buildsmdtape()
: first an object that specifies the manifold and any transformation to another manifold is created; then a tape of the log-likelihood (without normalising constant) is created; finally a tape of \(A(z) + B(z) + C(z)\) is created.
buildsmdtape(
start,
tran = "identity",
end = start,
ll,
ytape,
usertheta,
bdryw = "ones",
acut = 1,
thetatape_creator = function(n) { seq(length.out = n) },
verbose = FALSE
)
A list of:
an ADFun
object containing a tape of an improper likelihood with \(z\) on the end
manifold as the independent variable
an ADFun
object containing a tape of the score matching discrepancy function with the non-fixed parameters as the independent variable, and the measurements on the end
manifold as the dynamic parameter.
some information about the tapes
The starting manifold. Used for checking that tran
and man
match.
The name of a transformation. Available transformations are
``sqrt''
``alr''
``clr''
``none'' or `identity'
The name of the manifold that tran
maps start
to. Available manifolds are:
``sph'' unit sphere
``Hn111'' hyperplane normal to the vector \(1, 1, 1, 1, ...\)
``sim'' simplex
``Euc'' Euclidean space
The name of an inbuilt improper log-likelihood function to tape (which also specifies the parametric model family). On Linux operating systems a custom log-likelihood function created by customll()
can also be used; the ll
should operate on the untransformed (i.e. starting) manifold.
An example measurement value to use for creating the tapes. In the natural (i.e. start
) manifold of the log-likelihood function.
Please ensure that ytape
is the interior of the manifold and non-zero.
A vector of parameter elements for the likelihood function. NA
elements will become dynamic parameters. Other elements will be fixed at the provided value. The length of usertheta
must be the correct length for the log-likelihood - no checking is conducted.
The name of the boundary weight function. "ones" for manifolds without boundary. For the simplex and positive orthant of the sphere, "prodsq" and "minsq" are possible - see ppi()
for more information on these.
A parameter passed to the boundary weight function bdryw
. Ignored for bdryw = "ones"
.
A function that accepts an integer n
, and returns a vector of n
length. The function is used to fill in the NA
elements of usertheta
when building the tapes. Please ensure that the values filled by thetatape_creator
lead to plausible parameter vectors for the chosen log-likelihood.
If TRUE
more details are printed when taping. These details are for debugging and will likely be comprehensible only to users familiar with the source code of this package.
This package uses version 2024000.5 of the algorithmic differentiation library CppAD
bell2023cpscorematchingad to build score matching estimators.
Full help for CppAD
can be found at https://cppad.readthedocs.io/.
Differentiation proceeds by taping the basic (atomic) operations performed on the independent variables and dynamic parameters. The atomic operations include multiplication, division, addition, sine, cosine, exponential and many more.
Example values for the variables and parameters are used to conduct this taping, so care must be taken with any conditional (e.g. if-then) operations, and CppAD
has a special tool for this called CondExp
(short for conditional expressions
).
The result of taping is an object of class ADFun
in CppAD
and is often called a tape.
This ADFun
object can be evaluated, differentiated, used for further taping (via CppAD
's base2ad()
), solving differential equations and more.
The differentiation is with respect to the independent variables, however the dynamic parameters can be altered which allows for creating a new ADFun
object where the dynamic parameters become independent variables (see tapeSwap()
).
For the purposes of score matching, there are also fixed parameters, which are the elements of the model's parameter vector that are given and not estimated.
Each time a tape is evaluated the corresponding C++
object is altered. Parallel use of the same ADFun
object thus requires care and is not tested. For now I recommend creating a new ADFun
object for each CPU.
There is no checking of the inputs ytape
and usertheta
.
The improper log-likelihood (without normalising constant) must be implemented in C++
and is selected by name. Similarly the transforms of the manifold must be implemented in C++
and selected by name.
When using, CppAD
one first creates tapes of functions. These tapes can then be used for evaluating the function and its derivatives, and generating further tapes through argument swapping, differentiation and composition.
The taping relies on specifying typical argument values for the functions (see Introduction to CppAD Tapes below).
Tapes can have both independent variables and dynamic parameters.
The differentiation with CppAD
occurs with respect to the independent variables.
Tapes of tapes are possible, including tapes that swap the independent and dynamic variables - this is how this package differentiates with respect to a dynamic variables (see tapeSwap()
).
To build a tape for the score matching discrepancy function, the package first tapes the map from a point \(z\) on the end
manifold to the value of the improper log-likelihood, where the independent variable is the \(z\), the dynamic parameter is a vector of the parameters to estimate, and the remaining model parameters are fixed and not estimated.
This tape is then used to generate a tape for the score matching discrepancy function where the parameters to estimate are the independent variable.
Only some combinations of start
, tran
and end
are available because tran
must map between start
and end
.
These combinations of start
-tran
-end
are currently available:
sim-sqrt-sph
sim-identity-sim
sim-alr-Euc
sim-clr-Hn111
sph-identity-sph
Euc-identity-Euc
Currently available improper log-likelihood functions are:
dirichlet
ppi
vMF
Bingham
FB
Other tape builders:
moretapebuilders
p <- 3
u <- rep(1/sqrt(p), p)
ltheta <- p #length of vMF parameter vector
intheta <- rep(NA, length.out = ltheta)
tapes <- buildsmdtape("sph", "identity", "sph", "vMF",
ytape = u,
usertheta = intheta,
"ones", verbose = FALSE
)
evaltape(tapes$lltape, u, runif(n = ltheta))
evaltape(tapes$smdtape, runif(n = ltheta), u)
u <- rep(1/3, 3)
tapes <- buildsmdtape("sim", "sqrt", "sph", "ppi",
ytape = u,
usertheta = ppi_paramvec(p = 3),
bdryw = "minsq", acut = 0.01,
verbose = FALSE
)
evaltape(tapes$lltape, u, rppi_egmodel(1)$theta)
evaltape(tapes$smdtape, rppi_egmodel(1)$theta, u)
Run the code above in your browser using DataLab