scalation/scalation/scalation.modeling/scalation.modeling.autograd/MultiHeadAttention

MultiHeadAttention

scalation.modeling.autograd.MultiHeadAttention

class MultiHeadAttention(numHeads: Int, dModel: Int) extends SeqModule

Implements the Multi-Head Attention mechanism, a key component of transformer models. This class performs linear projections of the input tensors, splits them into multiple attention heads, applies scaled dot-product attention to each head, and combines the results into a single output tensor.

Value parameters

dModel: the dimensionality of the model (input and output feature size)
numHeads: the number of attention heads

Attributes

See also: https://arxiv.org/abs/1706.03762 "Attention Is All You Need" by Vaswani et al., 2017.

https://dev-discuss.pytorch.org/t/understanding-multi-head-attention-for-ml-framework-developers/1792 "Understanding Multi-Head Attention for ML Framework Developers"

https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html PyTorch MultiheadAttention Documentation
Graph
Supertypes: class SeqModule

class BaseModule

class Object

trait Matchable

class Any

Members list

Value members

Concrete methods

Forward pass for the Multi-Head Attention module. This method takes three input tensors (query, key, value), performs linear projections, splits them into multiple heads, applies scaled dot-product attention to each head, and combines the results into a single output tensor.

Value parameters

inputs: an IndexedSeq containing the query (q), key (k), and value (v) tensors

Attributes

Returns: an IndexedSeq containing the resulting output tensor
Throws: IllegalArgumentException
if the number of inputs is not 3
Definition Classes: SeqModule

Inherited methods

Alias for forward, allows calling the module as a function: module(xs).

Attributes

Inherited from:: SeqModule

Set the module to evaluation mode (and all submodules recursively).

Attributes

Inherited from:: BaseModule

Return the gradients of all parameters.

Attributes

Inherited from:: BaseModule

Return all trainable parameters, including those from submodules.

Attributes

Inherited from:: BaseModule

Replace the current parameters with new ones. Useful for weight updates, loading saved models, etc.

Value parameters

newParams: The new parameter list to assign

Attributes

Inherited from:: BaseModule

Set the module to training mode (and all submodules recursively).

Attributes

Inherited from:: BaseModule

Zero out all gradients (in-place).

Attributes

Inherited from:: BaseModule

Inherited fields

Flag to control training or evaluation behavior.

Attributes

Inherited from:: BaseModule

Automatically detect submodules (other BaseModules) within this module.

Attributes

Inherited from:: BaseModule

In this article

Generated with