In Python code, this is internal to attention_layer. Pulling it out into separate function here.
transpose_for_scores(
input_tensor,
batch_size,
num_attention_heads,
seq_length,
width
)
Tensor to reshape and transpose.
Size of the first dimension of input_tensor.
Size of the third dimension of input_tensor. (Will be transposed to second dimension.)
Size of the second dimension of input_tensor. (Will be transposed to third dimension.)
Size of fourth dimension of input_tensor.
Tensor of shape: batch_size, num_attention_heads, seq_length, width.