Implements the Multi-Head Attention mechanism, a key component of transformer models. This class performs linear projections of the input tensors, splits them into multiple attention heads, applies scaled dot-product attention to each head, and combines the results into a single output tensor.
Value parameters
dModel
the dimensionality of the model (input and output feature size)
Forward pass for the Multi-Head Attention module. This method takes three input tensors (query, key, value), performs linear projections, splits them into multiple heads, applies scaled dot-product attention to each head, and combines the results into a single output tensor.
Forward pass for the Multi-Head Attention module. This method takes three input tensors (query, key, value), performs linear projections, splits them into multiple heads, applies scaled dot-product attention to each head, and combines the results into a single output tensor.
Value parameters
inputs
an IndexedSeq containing the query (q), key (k), and value (v) tensors
Attributes
Returns
an IndexedSeq containing the resulting output tensor