Allows the model to jointly attend to information from different representation subspaces. See reference: Attention Is All You Need
nn_multihead_attention(
embed_dim,
num_heads,
dropout = 0,
bias = TRUE,
add_bias_kv = FALSE,
add_zero_attn = FALSE,
kdim = NULL,
vdim = NULL,
batch_first = FALSE
)
total dimension of the model.
parallel attention heads. Note that embed_dim
will be split
across num_heads
(i.e. each head will have dimension embed_dim %/% num_heads
).
a Dropout layer on attn_output_weights. Default: 0.0.
add bias as module parameter. Default: True.
add bias to the key and value sequences at dim=0.
add a new batch of zeros to the key and value sequences at dim=1.
total number of features in key. Default: NULL
total number of features in value. Default: NULL
. Note: if kdim
and vdim are NULL
, they will be set to embed_dim such that query, key,
and value have the same number of features.
if TRUE
then the input and output tensors are
Inputs:
query: batch_first
argument)
key: batch_first
argument)
value: batch_first
argument)
key_padding_mask: True
will be
ignored while the position with the value of False
will be unchanged.
attn_mask: 2D mask True
are not allowed to attend
while False
values will be unchanged. If a FloatTensor is provided, it
will be added to the attention weight.
Outputs:
attn_output: batch_first
argument)
attn_output_weights:
if avg_weights
is TRUE
(the default), the output attention
weights are averaged over the attention heads, giving a tensor of shape
if avg_weights
is FALSE
, the attention weight tensor is output
as-is, with shape
if (torch_is_installed()) {
if (FALSE) {
multihead_attn <- nn_multihead_attention(embed_dim, num_heads)
out <- multihead_attn(query, key, value)
attn_output <- out[[1]]
attn_output_weights <- out[[2]]
}
}
Run the code above in your browser using DataLab