Generate empirical Bayes regularization (priors) and choose initial values and ranges for (isotropic) lengthscale and nugget parameters to a Gaussian correlation function for a GP regression model

```
darg(d, X, samp.size = 1000)
garg(g, y)
```

d

can be `NULL`

, or a scalar indicating an initial value
or a partial `list`

whose format matches the one described
in the Value section below

g

can be `NULL`

, or a scalar indicating an initial value
or a partial `list`

whose format matches the one described
in the Value section below

X

a `matrix`

or `data.frame`

containing
the full (large) design matrix of input locations

y

a vector of responses/dependent values

samp.size

a scalar integer indicating a subset size of `X`

to use
for calculations; this is important for very large `X`

matrices
since the calculations are quadratic in `nrow(X)`

Both functions return a list containing the following entries. If the
input object (`d`

or `g`

) specifies one of the values then
that value is copied to the same list entry on output. See the
Details section for how these values are calculated

by default, `TRUE`

for `darg`

and `FALSE`

for `garg`

starting value chosen from the quantiles of
`distance(X)`

or `(y - mean(y))^2`

minimum value in allowable range for the parameter - for future inference purposes

maximum value in allowable range for the parameter - for future inference purposes

shape and rate parameters specifying a Gamma prior for the parameter

These functions use aspects of the data, either `X`

or `y`

,
to form weakly informative default priors and choose initial values
for a lengthscale and nugget parameter. This is useful since the
likelihood can sometimes be very flat, and even with proper priors
inference can be sensitive to the specification of those priors
and any initial search values. The focus here is on avoiding pathologies
while otherwise remaining true to the spirit of MLE calculation.

`darg`

output specifies MLE inference (`out$mle = TRUE`

)
by default, whereas `garg`

instead fixes the nugget at the starting value,
which may be sensible for emulating deterministic computer simulation data;
when `out$mle = FALSE`

the calculated range outputs `c(out$min, out$max)`

are set to dummy values that are ignored in other parts of the laGP package.

`darg`

calculates a Gaussian distance matrix between all pairs of
`X`

rows, or a subsample of rows of size `samp.size`

. From
those distances it chooses the range and start values from the range
of (non-zero) distances and the `0.1`

quantile, respectively.
The Gamma prior values have a shape of `out$a = 3/2`

and a rate
`out$b`

chosen by the incomplete Gamma inverse function to put
`0.95`

probability below `out$max`

.

`garg`

is similar except that it works with `(y- mean(y))^2`

instead of the pairwise distances of `darg`

. The only difference
is that the starting value is chosen as the 2.5% quantile.

# NOT RUN { ## motorcycle data library(MASS) X <- matrix(mcycle[,1], ncol=1) Z <- mcycle[,2] ## get darg and garg darg(NULL, X) garg(list(mle=TRUE), Z) # }