cppad_search: Iterative Score Matching Estimator Using Conjugate-Gradient Descent

Description

Uses conjugate gradient descent to search for a vector of parameters such that gradient of the score matching discrepancy is within tolerance of zero. Also estimates standard errors and covariance.

Usage

cppad_search(
  smdtape,
  theta,
  Y,
  Yapproxcentres = NA * Y,
  w = rep(1, nrow(Y)),
  approxorder = 10,
  control = list(tol = 1e-15, checkgrad = TRUE)
)

Arguments

smdtape: A tape (Rcpp_ADFun object) of a score matching discrepancy function. The smdtape's independent variables are assumed to be the model parameters to fit and the smdtape's dynamic parameter is a (multivariate) measurement.
theta: The starting parameter set
Y: A matrix of multivariate observations. Each row is an observation. The number of columns of Y must be smdtape$size_dyn_ind.
Yapproxcentres: A matrix of Taylor approximation centres for rows of Y that require approximation. NA for rows that do not require approximation.
w: Weights for each observation.
approxorder: The order of Taylor approximation to use.
control: Control parameters passed to optimx::Rcgmin()

Details

The score matching discrepancy function and gradient of the score matching function are passed to optimx::Rcgmin(). The call to optimx::Rcgmin() uses the sum of observations (as opposed to the mean) to reduce floating point inaccuracies. This has implications for the meaning of the control parameters passed to Rcgmin() (e.g. tol). The results are converted into averages so the use of sums can be ignored when not setting control parameters, or studying the behaviour of Rcgmin.

Standard errors use the Godambe information matrix (aka sandwich method) and are only computed when the weights are constant. The estimate of the sensitivity matrix $G$ is the negative of the average over the Hessian of smdtape evaluated at each observation in Y. The estimate of the variability matrix $J$ is then the sample covariance (denominator of $n-1$) of the gradient of smdtape evaluated at each of the observations in Y for the estimated $\theta$. The variance of the estimator is then estimated as $G^{-1}JG^{-1}/n,$ where n is the number of observations.

Taylor approximation is available because boundary weight functions and transformations of the measure in Hyvärinen divergence can remove singularities in the model log-likelihood, however evaluation at these singularities may still involve computing intermediate values that are unbounded. If the singularity is ultimately removed, then Taylor approximation from a nearby location will give a very accurate evaluation at the removed singularity.

Examples

Run this code

smdtape <- tape_smd("sim", "sqrt", "sph", "ppi",
              ytape = rep(1/3, 3),
              usertheta = ppi_paramvec(p = 3),
              bdryw = "minsq", acut = 0.01,
              verbose = FALSE
              )$smdtape
Y <- rppi_egmodel(100)
cppad_search(smdtape, 0.9 * Y$theta, Y$sample)
sum((smvalues_wsum(smdtape, Y$sample, Y$theta)$grad/nrow(Y$sample))^2)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Details

See Also

Examples