A convenience function to perform a k-fold cross-validation experiment and
obtain mean squared error of prediction. Most of the arguments are similar to
iprior()
and kernL()
.
# S3 method for default
iprior_cv(
y,
...,
folds = 2,
par.cv = TRUE,
kernel = "linear",
method = "direct",
control = list(),
interactions = NULL,
est.lambda = TRUE,
est.hurst = FALSE,
est.lengthscale = FALSE,
est.offset = FALSE,
est.psi = TRUE,
fixed.hyp = NULL,
lambda = 1,
psi = 1,
nystrom = FALSE,
nys.seed = NULL
)# S3 method for formula
iprior_cv(
formula,
data,
folds = 2,
one.lam = FALSE,
par.cv = TRUE,
kernel = "linear",
method = "direct",
control = list(),
est.lambda = TRUE,
est.hurst = FALSE,
est.lengthscale = FALSE,
est.offset = FALSE,
est.psi = TRUE,
fixed.hyp = NULL,
lambda = 1,
psi = 1,
nystrom = FALSE,
nys.seed = NULL,
...
)
An iprior_xv
object containing a data frame of the
cross-validated values such as the log-likelihood, training MSE and test
MSE.
Vector of response variables
Only used when fitting using non-formula, enter the variables (vectors or matrices) separated by commas.
The number of cross-validation folds. Set equal to sample size
or Inf
to perform leave-one-out cross-validation.
Logical. Multithreading to fit the models? Defaults to
TRUE
.
Character vector indicating the type of kernel for the variables. Available choices are:
"linear"
-
(default) for the linear kernel
"canonical"
- alternative
name for "linear"
"fbm"
, "fbm,0.5"
- for the
fBm kernel with Hurst coefficient 0.5 (default)
"se"
,
"se,1"
- for the SE kernel with lengthscale 1 (default)
"poly"
, "poly2"
, "poly2,0"
- for the polynomial
kernel of degree 2 with offset 0 (default)
"pearson" - for the
Pearson kernel
The kernel
argument can also be a vector of length
equal to the number of variables, therefore it is possible to specify
different kernels for each variables. Note that factor type variables are
assigned the Pearson kernel by default, and that non-factor types can be
forced to use the Pearson kernel (not recommended).
The estimation method. One of:
"direct"
- for the direct minimisation of the marginal deviance using
optim()
's L-BFGS method
"em"
- for the EM algorithm
"mixed"
- combination of the direct and EM methods
"fixed"
- for just obtaining the posterior regression function
with fixed hyperparameters (default method when setting fixed.hyp =
TRUE
)
"canonical"
- an efficient estimation method which
takes advantage of the structure of the linear kernel
(Optional) A list of control options for the estimation procedure:
maxit
The maximum number of iterations
for the quasi-Newton optimisation or the EM algorithm. Defaults to
100
.
em.maxit
For method = "mixed"
, the number
of EM steps before switching to direct optimisation. Defaults to 5
.
stop.crit
The stopping criterion for the EM and L-BFGS
algorithm, which is the difference in successive log-likelihood values.
Defaults to 1e-8
.
theta0
The initial values for the hyperparameters. Defaults to random starting values.
report
The interval of reporting for the optim()
function.
restarts
The number of random restarts to perform.
Defaults to 0
. It's also possible to set it to TRUE
, in which
case the number of random restarts is set to the total number of available
cores.
no.cores
The number of cores in which to do random restarts. Defaults to the total number of available cores.
omega
The overrelaxation parameter for the EM algorithm - a value between 0 and 1.
Character vector to specify the interaction terms. When
using formulas, this is specified automatically, so is not required. Syntax
is "a:b"
to indicate variable a
interacts with variable
b
.
Logical. Estimate the scale parameters? Defaults to
TRUE
.
Logical. Estimate the Hurst coefficients for fBm kernels?
Defaults to FALSE
.
Logical. Estimate the lengthscales for SE kernels?
Defaults to FALSE
.
Logical. Estimate the offsets for polynomial kernels?
Defaults to FALSE
.
Logical. Estimate the error precision? Defaults to
TRUE
.
Logical. If TRUE
, then no hyperparameters are
estimated, i.e. all of the above est.x
are set to FALSE
, and
vice versa. If NULL
(default) then all of the est.x
defaults
are respected.
Initial/Default scale parameters. Relevant especially if
est.lambda = FALSE
.
Initial/Default value for error precision. Relevant especially if
est.psi = FALSE
.
Either logical or an integer indicating the number of Nystrom
samples to take. Defaults to FALSE
. If TRUE
, then
approximately 10% of the sample size is used for the Nystrom
approximation.
The random seed for the Nystrom sampling. Defaults to
NULL
, which means the random seed is not fixed.
The formula to fit when using formula interface.
Data frame containing variables when using formula interface.
Logical. When using formula input, this is a convenient way of
letting the function know to treat all variables as a single variable (i.e.
shared scale parameter). Defaults to FALSE
.
Uses a multicore loop to fit the folds by default, set par.cv = FALSE
to not use multithreading.
if (FALSE) {
# 5-fold CV experiment
(mod.cv <- iprior_cv(y ~ X, gen_smooth(100), kernel = "se", folds = 5))
# LOOCV experiment
(mod.cv <- iprior_cv(y ~ X, gen_smooth(100), kernel = "se", folds = Inf))
# Can also get root MSE
print(mod.cv, "RMSE")
}
Run the code above in your browser using DataLab