
Last chance! 50% off unlimited learning
Sale ends in
Allows the model to jointly attend to information from different representation subspaces. See reference: Attention Is All You Need
nnf_multi_head_attention_forward(
query,
key,
value,
embed_dim_to_check,
num_heads,
in_proj_weight,
in_proj_bias,
bias_k,
bias_v,
add_zero_attn,
dropout_p,
out_proj_weight,
out_proj_bias,
training = TRUE,
key_padding_mask = NULL,
need_weights = TRUE,
attn_mask = NULL,
use_separate_proj_weight = FALSE,
q_proj_weight = NULL,
k_proj_weight = NULL,
v_proj_weight = NULL,
static_k = NULL,
static_v = NULL
)
total dimension of the model.
parallel attention heads.
input projection weight and bias.
currently undocumented.
bias of the key and value sequences to be added at dim=0.
currently undocumented.
add a new batch of zeros to the key and value sequences at dim=1.
probability of an element to be zeroed.
the output projection weight and bias.
currently undocumented.
apply dropout if is TRUE
.
True
will be ignored while the position with the value of False
will be unchanged.
output attn_output_weights.
2D mask True
is not allowed to attend while False
values will be unchanged. If a FloatTensor
is provided, it will be added to the attention weight.
the function accept the proj. weights for query, key, and value in different forms. If false, in_proj_weight will be used, which is a combination of q_proj_weight, k_proj_weight, v_proj_weight.
input projection weight and bias.
currently undocumented.
currently undocumented.
static key and value used for attention operators.
currently undocumented.