The Sylvester flow uses two triangular matrices (R1 and R2) and Householder reflections to construct invertible transformations.
The transformation is parameterized as follows:
$$z = Q R_1 h(Q^T R_2 zk + b) + zk,$$
where:
Q is an orthogonal matrix obtained via Householder reflections.
R1 and R2 are upper triangular matrices with learned diagonal elements.
h is a non-linear activation function (default: torch_tanh).
b is a learned bias vector.
The log determinant of the Jacobian is computed to ensure the invertibility of the transformation and is given by:
$$\log |det J| = \sum_{i=1}^d \log |diag_1[i] \cdot diag_2[i] \cdot h'(RQ^T zk + b) + 1|,$$
where diag_1 and diag_2 are the learned diagonal elements of R1 and R2, respectively, and h\' is the derivative of the activation function.