layer_attention: Dot-product attention layer, a.k.a. Luong-style attention

Description

Dot-product attention layer, a.k.a. Luong-style attention

Usage

layer_attention(
  inputs,
  use_scale = FALSE,
  score_mode = "dot",
  ...,
  dropout = NULL
)

Arguments

inputs

List of the following tensors:

query: Query Tensor of shape [batch_size, Tq, dim].
value: Value Tensor of shape [batch_size, Tv, dim].
key: Optional key Tensor of shape [batch_size, Tv, dim]. If not given, will use value for both key and value, which is the most common case.

use_scale

If TRUE, will create a scalar variable to scale the attention scores.

score_mode

Function to use to compute attention scores, one of {"dot", "concat"}. "dot" refers to the dot product between the query and key vectors. "concat" refers to the hyperbolic tangent of the concatenation of the query and key vectors.

...

standard layer arguments (e.g., batch_size, dtype, name, trainable, weights)

dropout

Float between 0 and 1. Fraction of the units to drop for the attention scores. Defaults to 0.0.

Details

inputs are query tensor of shape [batch_size, Tq, dim], value tensor of shape [batch_size, Tv, dim] and key tensor of shape [batch_size, Tv, dim]. The calculation follows the steps:

Calculate scores with shape [batch_size, Tq, Tv] as a query-key dot product: scores = tf$matmul(query, key, transpose_b=TRUE).
Use scores to calculate a distribution with shape [batch_size, Tq, Tv]: distribution = tf$nn$softmax(scores).
Use distribution to create a linear combination of value with shape [batch_size, Tq, dim]: return tf$matmul(distribution, value).

Description

Usage

Arguments

Details

See Also