Defines a torch module for spatial encoding.
This function is based on the paper by Vivien Garnot referenced below and code available on github at https://github.com/VSainteuf/pytorch-psetae.
We also used the code made available by Maja Schneider in her work with Marco Körner referenced below and available at https://github.com/maja601/RC2020-psetae.
There is an important difference: the model proposed by Garnot assumes that the samples are available by parcel. In his model, the samples from the same parcel are averaged using an MLP. The current function implements an alternative to Garnot's pixel set encoder for the case when only individual pixels are available.
The spatial encoder is run for each temporal instance of the observations. Thus it transforms a pixel with n bands to a pixel associated with an output dimension of a linear encoder.
The input of the PSE is a 3D tensors with shape (batch_size x n_times x n_bands]
Since the input tensors have a temporal dimension, this dimension will be combined with the batch dimension so that the complete sequences are processed at once. Then the temporal dimension is separated back to produce a tensor of shape batch_size x n_times x embedding_dim
Embedding dimension is the number of nodes in the last layer of the MLP used to process the input sequence.
If you use this method, please cite Garnot's and Schneider's work.
.torch_pixel_spatial_encoder(n_bands, layers_spatial_encoder = c(32, 64, 128))
A 3D shape tensor block.
Number of bands per pixel.
Layers of MLP spatial encoder
Charlotte Pelletier, charlotte.pelletier@univ-ubs.fr
Gilberto Camara, gilberto.camara@inpe.br
Rolf Simoes, rolf.simoes@inpe.br
Felipe Souza, lipecaso@gmail.com
Vivien Garnot, Loic Landrieu, Sebastien Giordano, and Nesrine Chehata, "Satellite Image Time Series Classification with Pixel-Set Encoders and Temporal Self-Attention", 2020 Conference on Computer Vision and Pattern Recognition. pages 12322-12331. DOI: 10.1109/CVPR42600.2020.01234
Schneider, Maja; Körner, Marco, "[Re] Satellite Image Time Series Classification with Pixel-Set Encoders and Temporal Self-Attention." ReScience C 7 (2), 2021. DOI: 10.5281/zenodo.4835356