MultiHeadAttention

scalation.modeling.autograd.MultiHeadAttention
class MultiHeadAttention(numHeads: Int, dModel: Int) extends SeqModule

Implements the Multi-Head Attention mechanism, a key component of transformer models. This class performs linear projections of the input tensors, splits them into multiple attention heads, applies scaled dot-product attention to each head, and combines the results into a single output tensor.

Value parameters

dModel

the dimensionality of the model (input and output feature size)

numHeads

the number of attention heads

Attributes

See also

https://arxiv.org/abs/1706.03762 "Attention Is All You Need" by Vaswani et al., 2017.

Graph
Supertypes
class SeqModule
class BaseModule
class Object
trait Matchable
class Any

Members list

Value members

Concrete methods

override def forward(inputs: IndexedSeq[Variabl]): IndexedSeq[Variabl]

Forward pass for the Multi-Head Attention module. This method takes three input tensors (query, key, value), performs linear projections, splits them into multiple heads, applies scaled dot-product attention to each head, and combines the results into a single output tensor.

Forward pass for the Multi-Head Attention module. This method takes three input tensors (query, key, value), performs linear projections, splits them into multiple heads, applies scaled dot-product attention to each head, and combines the results into a single output tensor.

Value parameters

inputs

an IndexedSeq containing the query (q), key (k), and value (v) tensors

Attributes

Returns

an IndexedSeq containing the resulting output tensor

Throws
IllegalArgumentException

if the number of inputs is not 3

Definition Classes

Inherited methods

def apply(inputs: IndexedSeq[Variabl]): IndexedSeq[Variabl]

Alias for forward, allows calling the module as a function: module(xs).

Alias for forward, allows calling the module as a function: module(xs).

Attributes

Inherited from:
SeqModule
def eval(): Unit

Set the module to evaluation mode (and all submodules recursively).

Set the module to evaluation mode (and all submodules recursively).

Attributes

Inherited from:
BaseModule
def gradients: IndexedSeq[TensorD]

Return the gradients of all parameters.

Return the gradients of all parameters.

Attributes

Inherited from:
BaseModule
def parameters: IndexedSeq[Variabl]

Return all trainable parameters, including those from submodules.

Return all trainable parameters, including those from submodules.

Attributes

Inherited from:
BaseModule
def setParameters(newParams: IndexedSeq[Variabl]): Unit

Replace the current parameters with new ones. Useful for weight updates, loading saved models, etc.

Replace the current parameters with new ones. Useful for weight updates, loading saved models, etc.

Value parameters

newParams

The new parameter list to assign

Attributes

Inherited from:
BaseModule
def train(mode: Boolean = ...): Unit

Set the module to training mode (and all submodules recursively).

Set the module to training mode (and all submodules recursively).

Attributes

Inherited from:
BaseModule
def zeroGrad()(using ops: AutogradOps): Unit

Zero out all gradients (in-place).

Zero out all gradients (in-place).

Attributes

Inherited from:
BaseModule

Inherited fields

var inTrainingMode: Boolean

Flag to control training or evaluation behavior.

Flag to control training or evaluation behavior.

Attributes

Inherited from:
BaseModule
lazy val subModules: IndexedSeq[BaseModule]

Automatically detect submodules (other BaseModules) within this module.

Automatically detect submodules (other BaseModules) within this module.

Attributes

Inherited from:
BaseModule