sits_lighttae: Train a model using Lightweight Temporal Self-Attention Encoder

Description

Implementation of Light Temporal Attention Encoder (L-TAE) for satellite image time series. This is a lightweight version of the temporal attention encoder proposed by Garnot et al. For the TAE, please see sits_tae.

TAE is a simplified version of the well-known self-attention architeture which is used in large language models. Its modified self-attention scheme that uses the input embeddings as values. TAE defines a single master query for each sequence, computed from the temporal average of the queries. This master query is compared to the sequence of keys to produce a single attention mask used to weight the temporal mean of values into a single feature vector.

The lightweight version of TAE further simplifies the TAE model. It defines master query of each head as a model parameter instead of the results of a linear layer, as is done it TAE. The authors argue that such simplification reduces the number of parameters, while the lack of flexibility is compensated by the larger number of available heads.

Usage

sits_lighttae(
  samples = NULL,
  samples_validation = NULL,
  epochs = 150L,
  batch_size = 128L,
  validation_split = 0.2,
  optimizer = torch::optim_adamw,
  opt_hparams = list(lr = 5e-04, eps = 1e-08, weight_decay = 7e-04),
  lr_decay_epochs = 50L,
  lr_decay_rate = 1,
  patience = 20L,
  min_delta = 0.01,
  seed = NULL,
  verbose = FALSE
)

Value

A fitted model to be used for classification of data cubes.

Arguments

samples: Time series with the training samples (tibble of class "sits").
samples_validation: Time series with the validation samples (tibble of class "sits"). If samples_validation parameter is provided, validation_split is ignored.
epochs: Number of iterations to train the model (integer, min = 1, max = 20000).
batch_size: Number of samples per gradient update (integer, min = 16L, max = 2048L)
validation_split: Fraction of training data to be used as validation data.
optimizer: Optimizer function to be used.
opt_hparams: Hyperparameters for optimizer: lr : Learning rate of the optimizer eps: Term added to the denominator to improve numerical stability. weight_decay: L2 regularization rate.
lr_decay_epochs: Number of epochs to reduce learning rate.
lr_decay_rate: Decay factor for reducing learning rate.
patience: Number of epochs without improvements until training stops.
min_delta: Minimum improvement in loss function to reset the patience counter.
seed: Seed for random values.
verbose: Verbosity mode (TRUE/FALSE). Default is FALSE.

Author

Gilberto Camara, gilberto.camara@inpe.br

Rolf Simoes, rolfsimoes@gmail.com

Charlotte Pelletier, charlotte.pelletier@univ-ubs.fr

References

Vivien Garnot, Loic Landrieu, Sebastien Giordano, and Nesrine Chehata, "Satellite Image Time Series Classification with Pixel-Set Encoders and Temporal Self-Attention", 2020 Conference on Computer Vision and Pattern Recognition. pages 12322-12331. tools:::Rd_expr_doi("10.1109/CVPR42600.2020.01234")

Vivien Garnot, Loic Landrieu, "Lightweight Temporal Self-Attention for Classifying Satellite Images Time Series", arXiv preprint arXiv:2007.00586, 2020.

Schneider, Maja; Körner, Marco, "[Re] Satellite Image Time Series Classification with Pixel-Set Encoders and Temporal Self-Attention." ReScience C 7 (2), 2021. tools:::Rd_expr_doi("10.5281/zenodo.4835356")

Examples

Run this code

if (sits_run_examples()) {
    # create a lightTAE model
    torch_model <- sits_train(samples_modis_ndvi, sits_lighttae())
    # plot the model
    plot(torch_model)
    # create a data cube from local files
    data_dir <- system.file("extdata/raster/mod13q1", package = "sits")
    cube <- sits_cube(
        source = "BDC",
        collection = "MOD13Q1-6.1",
        data_dir = data_dir
    )
    # classify a data cube
    probs_cube <- sits_classify(
        data = cube, ml_model = torch_model, output_dir = tempdir()
    )
    # plot the probability cube
    plot(probs_cube)
    # smooth the probability cube using Bayesian statistics
    bayes_cube <- sits_smooth(probs_cube, output_dir = tempdir())
    # plot the smoothed cube
    plot(bayes_cube)
    # label the probability cube
    label_cube <- sits_label_classification(
        bayes_cube,
        output_dir = tempdir()
    )
    # plot the labelled cube
    plot(label_cube)
}

Run the code above in your browser using DataLab