train: Train a neural estimator

Description

The function caters for different variants of "on-the-fly" simulation. Specifically, a sampler can be provided to continuously sample new parameter vectors from the prior, and a simulator can be provided to continuously simulate new data conditional on the parameters. If provided with specific sets of parameters (theta_train and theta_val) and/or data (Z_train and Z_val), they will be held fixed during training.

Note that using R functions to perform "on-the-fly" simulation requires the user to have installed the Julia package RCall.

Usage

train(
  estimator,
  sampler = NULL,
  simulator = NULL,
  theta_train = NULL,
  theta_val = NULL,
  Z_train = NULL,
  Z_val = NULL,
  m = NULL,
  M = NULL,
  K = 10000,
  xi = NULL,
  loss = "absolute-error",
  learning_rate = 1e-04,
  epochs = 100,
  batchsize = 32,
  savepath = "",
  stopping_epochs = 5,
  epochs_per_Z_refresh = 1,
  epochs_per_theta_refresh = 1,
  simulate_just_in_time = FALSE,
  use_gpu = TRUE,
  verbose = TRUE
)

Value

a trained neural estimator or, if m is a vector, a list of trained neural estimators

Arguments

estimator: a neural estimator
sampler: a function that takes an integer K, samples K parameter vectors from the prior, and returns them as a pxK matrix
simulator: a function that takes a pxK matrix of parameters and an integer m, and returns K simulated data sets each containing m independent replicates
theta_train: a set of parameters used for updating the estimator using stochastic gradient descent
theta_val: a set of parameters used for monitoring the performance of the estimator during training
Z_train: a simulated data set used for updating the estimator using stochastic gradient descent
Z_val: a simulated data set used for monitoring the performance of the estimator during training
m: vector of sample sizes. If NULL (default), a single neural estimator is trained, with the sample size inferred from Z_val. If m is a vector of integers, a sequence of neural estimators is constructed for each sample size; see the Julia documentation for trainx() for further details
M: deprecated; use m
K: the number of parameter vectors sampled in the training set at each epoch; the size of the validation set is set to K/5.
xi: a list of objects used for data simulation (e.g., distance matrices); if it is provided, the parameter sampler is called as sampler(K, xi).
loss: the loss function: a string ('absolute-error' for mean-absolute-error loss or 'squared-error' for mean-squared-error loss), or a string of Julia code defining the loss function. For some classes of estimators (e.g., QuantileEstimator and RatioEstimator), the loss function does not need to be specified.
learning_rate: the learning rate for the optimiser ADAM (default 1e-3)
epochs: the number of epochs to train the neural network. An epoch is one complete pass through the entire training data set when doing stochastic gradient descent.
batchsize: the batchsize to use when performing stochastic gradient descent, that is, the number of training samples processed between each update of the neural-network parameters.
savepath: path to save the trained estimator and other information; if null (default), nothing is saved. Otherwise, the neural-network parameters (i.e., the weights and biases) will be saved during training as bson files; the risk function evaluated over the training and validation sets will also be saved, in the first and second columns of loss_per_epoch.csv, respectively; the best parameters (as measured by validation risk) will be saved as best_network.bson.
stopping_epochs: cease training if the risk doesn't improve in this number of epochs (default 5).
epochs_per_Z_refresh: integer indicating how often to refresh the training data
epochs_per_theta_refresh: integer indicating how often to refresh the training parameters; must be a multiple of epochs_per_Z_refresh
simulate_just_in_time: flag indicating whether we should simulate "just-in-time", in the sense that only a batchsize number of parameter vectors and corresponding data are in memory at a given time
use_gpu: a boolean indicating whether to use the GPU if one is available
verbose: a boolean indicating whether information, including empirical risk values and timings, should be printed to the console during training.

Examples

Run this code

if (FALSE) {
# Construct a neural Bayes estimator for replicated univariate Gaussian 
# data with unknown mean and standard deviation. 

# Load R and Julia packages
library("NeuralEstimators")
library("JuliaConnectoR")
juliaEval("using NeuralEstimators, Flux, Distributions")

# Define the neural-network architecture
estimator <- juliaEval('
 d = 1    # dimension of each replicate
 p = 2    # number of parameters in the model
 w = 32   # width of each layer
 psi = Chain(Dense(d, w, relu), Dense(w, w, relu))
 phi = Chain(Dense(w, w, relu), Dense(w, p))
 deepset = DeepSet(psi, phi)
 estimator = PointEstimator(deepset)
')

# Sampler from the prior
sampler <- function(K) {
  mu    <- rnorm(K)      # Gaussian prior for the mean
  sigma <- rgamma(K, 1)  # Gamma prior for the standard deviation
  theta <- matrix(c(mu, sigma), byrow = TRUE, ncol = K)
  return(theta)
}

# Data simulator
simulator <- function(theta_set, m) {
  apply(theta_set, 2, function(theta) {
    t(rnorm(m, theta[1], theta[2]))
  }, simplify = FALSE)
}

# Train using fixed parameter and data sets 
theta_train <- sampler(10000)
theta_val   <- sampler(2000)
m <- 30 # number of iid replicates
Z_train <- simulator(theta_train, m)
Z_val   <- simulator(theta_val, m)
estimator <- train(estimator, 
                   theta_train = theta_train, 
                   theta_val = theta_val, 
                   Z_train = Z_train, 
                   Z_val = Z_val)
                   
# Train using simulation on-the-fly (requires Julia package RCall)
estimator <- train(estimator, sampler = sampler, simulator = simulator, m = m)

##### Simulation on-the-fly using Julia functions ####

# Defining the sampler and simulator in Julia can improve computational 
# efficiency by avoiding the overhead of communicating between R and Julia. 
# Julia is also fast (comparable to C) and so it can be useful to define 
# these functions in Julia when they involve for-loops. 

# Parameter sampler
sampler <- juliaEval("
      function sampler(K)
      	mu = rand(Normal(0, 1), K)
      	sigma = rand(Gamma(1), K)
      	theta = hcat(mu, sigma)'
      	return theta
      end")

# Data simulator
simulator <- juliaEval("
      function simulator(theta_matrix, m)
      	Z = [rand(Normal(theta[1], theta[2]), 1, m) for theta in eachcol(theta_matrix)]
      	return Z
      end")

# Train the estimator
estimator <- train(estimator, sampler = sampler, simulator = simulator, m = m)}

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

See Also

Examples