scalation.modeling.autograd
Members list
Type members
Classlikes
Computes the element-wise absolute value of a variable.
Computes the element-wise absolute value of a variable.
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
The Adam class implements the Adam optimization algorithm for updating model parameters. The Adam optimizer (Kingma & Ba, 2015) with optional L2 weight decay maintains first (m) and second (v) moment estimates and applies bias correction. Classical (non-decoupled) weight decay is applied by adding weightDecay * param to the raw gradient.
The Adam class implements the Adam optimization algorithm for updating model parameters. The Adam optimizer (Kingma & Ba, 2015) with optional L2 weight decay maintains first (m) and second (v) moment estimates and applies bias correction. Classical (non-decoupled) weight decay is applied by adding weightDecay * param to the raw gradient.
Value parameters
- beta1
-
exponential decay rate for the first moment estimates.
- beta2
-
exponential decay rate for the second moment estimates.
- eps
-
small constant added for numerical stability.
- lr
-
base Learning rate for updating the parameters.
- parameters
-
indexed sequence of Variables representing model parameters.
- weightDecay
-
L2 regularization coefficient (0.0 to disable)
Attributes
- See also
- Note
-
Call zeroGrad() before backward + step.
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Optimizerclass Objecttrait Matchableclass AnyShow all
Computes element-wise addition of two variables.
Computes element-wise addition of two variables.
Value parameters
- v1
-
the first variable.
- v2
-
the second variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Adds a constant value to a variable.
Adds a constant value to a variable.
Value parameters
- d
-
the constant to add.
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
The AutogradOps trait defines the core operations needed for automatic differentiation. It separates the mathematical operations on tensors (TensorD) from the autograd system (Variable, Function), allowing flexible extension. This trait is backed by a default implementation (see AutogradOps.default) using TensorD methods.
The AutogradOps trait defines the core operations needed for automatic differentiation. It separates the mathematical operations on tensors (TensorD) from the autograd system (Variable, Function), allowing flexible extension. This trait is backed by a default implementation (see AutogradOps.default) using TensorD methods.
Attributes
Companion object for AutogradOps that provides a default implementation.
Companion object for AutogradOps that provides a default implementation.
Attributes
- Companion
- trait
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
AutogradOps.type
The AutogradTest object contains various @main tests for autograd functionality. The tests validate basic arithmetic, complex expressions, activation functions, loss functions, and neural network layers with backpropagation.
The AutogradTest object contains various @main tests for autograd functionality. The tests validate basic arithmetic, complex expressions, activation functions, loss functions, and neural network layers with backpropagation.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
AutogradTest.type
The BaseModule is a base class for all neural network modules (layers, blocks, models). Provides support for:
The BaseModule is a base class for all neural network modules (layers, blocks, models). Provides support for:
- Parameter registration
- Automatic submodule detection
- Gradient management (zeroing)
- Training/evaluation mode switching Modules are structured hierarchically: a module can contain submodules.
Value parameters
- localParameters
-
the parameters (Variables) directly belonging to this module
Attributes
Computes the batched matrix multiplication of two variables.
Computes the batched matrix multiplication of two variables.
Value parameters
- v1
-
the first variable.
- v2
-
the second variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Computes the ceil of a variable (element-wise).
Computes the ceil of a variable (element-wise).
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Clips the elements of a variable to the range [min, max] (element-wise). Gradient is 1 for elements strictly inside (min, max), 0 for clipped ones (ties get 0.25 via mask product heuristic).
Clips the elements of a variable to the range [min, max] (element-wise). Gradient is 1 for elements strictly inside (min, max), 0 for clipped ones (ties get 0.25 via mask product heuristic).
Value parameters
- max
-
upper bound.
- min
-
lower bound.
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Represents a concatenation operation on a sequence of variables along a specified axis. This class performs a differentiable concatenation operation during the forward pass and splits the gradient during the backward pass to propagate it to the input variables.
Represents a concatenation operation on a sequence of variables along a specified axis. This class performs a differentiable concatenation operation during the forward pass and splits the gradient during the backward pass to propagate it to the input variables.
Value parameters
- axis
-
the axis along which to concatenate the variables
- vs
-
the sequence of input variables to concatenate
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
ANSI color codes for colored console output.
ANSI color codes for colored console output.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
ConsoleColor.type
Computes element-wise division of two variables.
Computes element-wise division of two variables.
Value parameters
- v1
-
the dividend.
- v2
-
the divisor.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Divides a variable by a constant.
Divides a variable by a constant.
Value parameters
- d
-
the constant divisor.
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Computes the dot product of two variables.
Computes the dot product of two variables.
Value parameters
- v1
-
the first variable.
- v2
-
the second variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Applies the ELU activation function.
Applies the ELU activation function.
Value parameters
- alpha
-
the ELU scaling parameter.
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Computes the exponential of a variable.
Computes the exponential of a variable.
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Computes the floor of a variable (element-wise).
Computes the floor of a variable (element-wise).
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
The Function base trait for all differentiable operations in the autograd system. A Function encapsulates both the forward computation (producing outputs) and the backward computation (propagating gradients). It also provides utility methods for handling unbroadcasting of shapes during the backward pass, ensuring correct gradient flow. Every custom operation should extend this trait and implement forward and backward.
The Function base trait for all differentiable operations in the autograd system. A Function encapsulates both the forward computation (producing outputs) and the backward computation (propagating gradients). It also provides utility methods for handling unbroadcasting of shapes during the backward pass, ensuring correct gradient flow. Every custom operation should extend this trait and implement forward and backward.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
-
class Absclass Addclass AddConstantclass BatchMatMulclass Ceilclass Clipclass Concatclass Divclass DivConstantclass Dotclass ELUclass Expclass Floorclass GRUCellFusedclass GeLUclass Identityclass LeakyReLUclass Logclass LogBaseclass MAELossclass MSELossclass MatMulclass Maxclass MaxScalarclass MaxValueclass Meanclass MeanAlongAxisclass Minclass MinScalarclass MinValueclass Mulclass MulConstantclass Negclass Permuteclass Powclass RNNCellFusedclass RNNFusedclass ReLUclass Reciprocalclass Reshapeclass Roundclass SSELossclass Sigmoidclass Signclass Sliceclass Softmaxclass Sqrtclass Stdclass StdAlongAxisclass Subclass SubConstantclass Sumclass Tanhclass Transposeclass Varianceclass VarianceAlongAxisShow all
The GRU class implements a multi-layer gated recurrent unit (GRU) network. It supports stacked GRU layers, where each layer processes the input sequence and passes its output to the next layer. The class also provides methods for parameter retrieval and forward computation.
The GRU class implements a multi-layer gated recurrent unit (GRU) network. It supports stacked GRU layers, where each layer processes the input sequence and passes its output to the next layer. The class also provides methods for parameter retrieval and forward computation.
Value parameters
- hiddenSize
-
number of features in the hidden state
- inputSize
-
number of features in the input at each time step
- numLayers
-
number of stacked GRU layers (default: 1)
Attributes
- See also
- Companion
- object
- Supertypes
The GRUCell class supports a gated recurrent unit cell: r_t = sigmoid(W_ir * x + b_ir + W_hr * h_{t-1} + b_hr) z_t = sigmoid(W_iz * x + b_iz + W_hz * h_{t-1} + b_hz) n_t = tanh(W_in * x + b_in + r_t ⊙ (W_hn * h_{t-1} + b_hn)) h_t = (1 - z_t) ⊙ n_t + z_t ⊙ h_{t-1} This class defines the parameters and forward computation for a GRU cell.
The GRUCell class supports a gated recurrent unit cell: r_t = sigmoid(W_ir * x + b_ir + W_hr * h_{t-1} + b_hr) z_t = sigmoid(W_iz * x + b_iz + W_hz * h_{t-1} + b_hz) n_t = tanh(W_in * x + b_in + r_t ⊙ (W_hn * h_{t-1} + b_hn)) h_t = (1 - z_t) ⊙ n_t + z_t ⊙ h_{t-1} This class defines the parameters and forward computation for a GRU cell.
Value parameters
- hiddenSize
-
number of hidden units
- inputSize
-
number of input features
Attributes
- See also
- Companion
- object
- Supertypes
The GRUCellFused Function implements a single GRU cell as one fused autograd op. It fuses all gate computations for better performance and fewer autograd nodes. Equations: r_t = sigmoid(W_ir * x + b_ir + W_hr * hPrev + b_hr) z_t = sigmoid(W_iz * x + b_iz + W_hz * hPrev + b_hz) n_t = tanh(W_in * x + b_in + r_t ⊙ (W_hn * hPrev + b_hn)) h_t = (1 - z_t) ⊙ n_t + z_t ⊙ hPrev Shapes: input : (B, I, 1) hidden : (B, H, 1) W_i* : (1, H, I) W_h* : (1, H, H) b_i*,b_h* : (1, H, 1)
The GRUCellFused Function implements a single GRU cell as one fused autograd op. It fuses all gate computations for better performance and fewer autograd nodes. Equations: r_t = sigmoid(W_ir * x + b_ir + W_hr * hPrev + b_hr) z_t = sigmoid(W_iz * x + b_iz + W_hz * hPrev + b_hz) n_t = tanh(W_in * x + b_in + r_t ⊙ (W_hn * hPrev + b_hn)) h_t = (1 - z_t) ⊙ n_t + z_t ⊙ hPrev Shapes: input : (B, I, 1) hidden : (B, H, 1) W_i* : (1, H, I) W_h* : (1, H, H) b_i*,b_h* : (1, H, 1)
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Applies the GeLU activation function.
Applies the GeLU activation function.
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
The GradCheck object provides methods the check the agreement between numerically computed gradient those computed using Automatic Differentiation (AD).
The GradCheck object provides methods the check the agreement between numerically computed gradient those computed using Automatic Differentiation (AD).
Attributes
- See also
-
calculus.Differential - Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
GradCheck.type
GraphExporter generates a computation graph visualization from a root Variabl. The graph includes variables, functions, dependency edges, tensor shapes, and optional gradient annotations. The resulting graph can be serialized to DOT, Mermaid, or JSON formats for visualization.
GraphExporter generates a computation graph visualization from a root Variabl. The graph includes variables, functions, dependency edges, tensor shapes, and optional gradient annotations. The resulting graph can be serialized to DOT, Mermaid, or JSON formats for visualization.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
GraphExporter.type
Applies the identity activation function.
Applies the identity activation function.
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Learning Rate Scheduler (LR Scheduler) trait. Defines a generic interface for schedulers that adjust the learning rate during optimization. Concrete implementations may update the learning rate based on iteration count, loss values, or other criteria. Notes: - The parameterless step() is intended for schedulers that adjust learning rate solely based on iteration count. - The step(currentLoss) method is intended for schedulers that adapt learning rate based on the current loss value. - By default, both methods throw UnsupportedOperationException; subclasses must override the method(s) they support.
Learning Rate Scheduler (LR Scheduler) trait. Defines a generic interface for schedulers that adjust the learning rate during optimization. Concrete implementations may update the learning rate based on iteration count, loss values, or other criteria. Notes: - The parameterless step() is intended for schedulers that adjust learning rate solely based on iteration count. - The step(currentLoss) method is intended for schedulers that adapt learning rate based on the current loss value. - By default, both methods throw UnsupportedOperationException; subclasses must override the method(s) they support.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
-
class ReduceLROnPlateauclass StepLR
The LayerNorm class implements Layer Normalization as described in: "Layer Normalization" by Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton
The LayerNorm class implements Layer Normalization as described in: "Layer Normalization" by Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton
Value parameters
- dModel
-
the number of features in the input
- eps
-
a small value to avoid division by zero
- ops
-
the autograd operations
Attributes
- See also
- Supertypes
Applies the LeakyReLU activation function.
Applies the LeakyReLU activation function.
Value parameters
- alpha
-
the negative slope coefficient.
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
A fully connected linear (affine) layer: output =weight.bmm(input) + bias Computes a linear transformation of the input tensor:
A fully connected linear (affine) layer: output =weight.bmm(input) + bias Computes a linear transformation of the input tensor:
- Weight shape: (1, outFeatures, inFeatures)
- Bias shape: (1, outFeatures, 1)
- Input shape: (batch, inFeatures, 1)
- Output shape: (batch, outFeatures, 1) The weight and bias are learnable parameters wrapped in
Variabl. Internally uses batched matrix multiplication and broadcasting for bias addition.
Value parameters
- inFeatures
-
the number of input features
- outFeatures
-
the number of output features
Attributes
- Companion
- object
- Supertypes
Computes the natural logarithm of a variable.
Computes the natural logarithm of a variable.
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Computes the logarithm of a variable with a specified base.
Computes the logarithm of a variable with a specified base.
Value parameters
- base
-
the base for the logarithm.
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Computes the Mean Absolute Error (MAE) loss.
Computes the Mean Absolute Error (MAE) loss.
Value parameters
- pred
-
the prediction variable.
- target
-
the target variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Computes the Mean Squared Error (MSE) loss.
Computes the Mean Squared Error (MSE) loss.
Value parameters
- pred
-
the prediction variable.
- target
-
the target variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Computes the matrix multiplication of two variables.
Computes the matrix multiplication of two variables.
Value parameters
- v1
-
the first variable.
- v2
-
the second variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Element-wise maximum of two variables. Gradient flows to the larger input; ties split as 0.5 / 0.5.
Element-wise maximum of two variables. Gradient flows to the larger input; ties split as 0.5 / 0.5.
Value parameters
- v1
-
first input.
- v2
-
second input.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Element-wise maximum between a variable and a scalar. Gradient is 1 where v > s; 0 where v < s; 0.5 where equal.
Element-wise maximum between a variable and a scalar. Gradient is 1 where v > s; 0 where v < s; 0.5 where equal.
Value parameters
- s
-
the scalar.
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Computes the maximum value in a variable (reduces to a scalar). Gradient is distributed equally among all elements achieving the max (handles ties).
Computes the maximum value in a variable (reduces to a scalar). Gradient is distributed equally among all elements achieving the max (handles ties).
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Computes the mean of all elements in a variable.
Computes the mean of all elements in a variable.
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Computes the mean of a variable along a specified axis (dimension reduced to size 1).
Computes the mean of a variable along a specified axis (dimension reduced to size 1).
Value parameters
- axis
-
the axis along which to compute the mean.
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Element-wise minimum of two variables. Gradient flows to the smaller input; ties split as 0.5 / 0.5.
Element-wise minimum of two variables. Gradient flows to the smaller input; ties split as 0.5 / 0.5.
Value parameters
- v1
-
first input.
- v2
-
second input.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Element-wise minimum between a variable and a scalar. Gradient is 1 where v < s; 0 where v > s; 0.5 where equal.
Element-wise minimum between a variable and a scalar. Gradient is 1 where v < s; 0 where v > s; 0.5 where equal.
Value parameters
- s
-
the scalar.
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Computes the minimum value in a variable (reduces to a scalar). Gradient is distributed equally among all elements achieving the min (handles ties).
Computes the minimum value in a variable (reduces to a scalar). Gradient is distributed equally among all elements achieving the min (handles ties).
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Standard module for layers that take a single input (e.g., Linear, Conv1D). Defines the abstract forward function for single input.
Standard module for layers that take a single input (e.g., Linear, Conv1D). Defines the abstract forward function for single input.
Value parameters
- localParameters
-
the parameters (Variables) directly belonging to this module
Attributes
- Supertypes
- Known subtypes
Computes element-wise multiplication of two variables.
Computes element-wise multiplication of two variables.
Value parameters
- v1
-
the first variable.
- v2
-
the second variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Multiplies a variable by a constant.
Multiplies a variable by a constant.
Value parameters
- d
-
the constant multiplier.
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Implements the Multi-Head Attention mechanism, a key component of transformer models. This class performs linear projections of the input tensors, splits them into multiple attention heads, applies scaled dot-product attention to each head, and combines the results into a single output tensor.
Implements the Multi-Head Attention mechanism, a key component of transformer models. This class performs linear projections of the input tensors, splits them into multiple attention heads, applies scaled dot-product attention to each head, and combines the results into a single output tensor.
Value parameters
- dModel
-
the dimensionality of the model (input and output feature size)
- numHeads
-
the number of attention heads
Attributes
- See also
-
https://arxiv.org/abs/1706.03762 "Attention Is All You Need" by Vaswani et al., 2017.
https://dev-discuss.pytorch.org/t/understanding-multi-head-attention-for-ml-framework-developers/1792 "Understanding Multi-Head Attention for ML Framework Developers"
https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html PyTorch MultiheadAttention Documentation
- Supertypes
Computes the negation of a variable.
Computes the negation of a variable.
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
The Optimizer abstract class optimizes model parameters. Notes: - Subclasses implement the specific update rule in step(). - The optimizer assumes that gradients (p.grad) have been computed and accumulated by the autograd engine before each call to step(). - Parameters with null gradients are safely ignored.
The Optimizer abstract class optimizes model parameters. Notes: - Subclasses implement the specific update rule in step(). - The optimizer assumes that gradients (p.grad) have been computed and accumulated by the autograd engine before each call to step(). - Parameters with null gradients are safely ignored.
Value parameters
- learningRate
-
the step size (η) used for gradient-based updates
- parameters
-
the trainable parameters, each wrapped in a
Variabl
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
Permutes axes of a tensor variable according to a specified ordering.
Permutes axes of a tensor variable according to a specified ordering.
Value parameters
- axes
-
the permutation of axes.
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Raises a variable to an integer power.
Raises a variable to an integer power.
Value parameters
- s
-
the exponent.
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
The RNN class implements a multi-layer recurrent neural network (RNN). It supports stacked RNN layers, where each layer processes the input sequence and passes its output to the next layer. The class also provides methods for parameter retrieval and forward computation.
The RNN class implements a multi-layer recurrent neural network (RNN). It supports stacked RNN layers, where each layer processes the input sequence and passes its output to the next layer. The class also provides methods for parameter retrieval and forward computation.
Value parameters
- activation
-
activation function to use: "tanh" (default) or "relu"
- hiddenSize
-
number of features in the hidden state
- inputSize
-
number of features in the input at each time step
- numLayers
-
number of stacked RNN layers (default: 1)
- ops
-
implicit autograd operations
Attributes
- See also
- Companion
- object
- Supertypes
The RNNCell class supports a simple RNN cell that updates the hidden state: h' = activation(W_ih * x + b_ih + W_hh * h + b_hh) using two biases instead of one.
The RNNCell class supports a simple RNN cell that updates the hidden state: h' = activation(W_ih * x + b_ih + W_hh * h + b_hh) using two biases instead of one.
Value parameters
- activation
-
activation function to use: "tanh" (default) or "relu"
- hiddenSize
-
number of hidden units
- inputSize
-
number of input features
Attributes
- See also
- Companion
- object
- Supertypes
The RNNCellFused Function implements a single RNN cell as one fused autograd op. It fuses the input/hidden projections and activation into a single node for improved performance and reduced autograd graph size. Equation: h_t = φ(W_ih * x + b_ih + W_hh * hPrev + b_hh) where φ ∈ {tanh, relu} Shapes: input : (B, I, 1) hidden : (B, H, 1) W_ih : (1, H, I) W_hh : (1, H, H) b_ih : (1, H, 1) b_hh : (1, H, 1) The function caches only what is needed for the backward pass:
The RNNCellFused Function implements a single RNN cell as one fused autograd op. It fuses the input/hidden projections and activation into a single node for improved performance and reduced autograd graph size. Equation: h_t = φ(W_ih * x + b_ih + W_hh * hPrev + b_hh) where φ ∈ {tanh, relu} Shapes: input : (B, I, 1) hidden : (B, H, 1) W_ih : (1, H, I) W_hh : (1, H, H) b_ih : (1, H, 1) b_hh : (1, H, 1) The function caches only what is needed for the backward pass:
- input and hidden states
- pre-activation value
- output after activation
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Fused RNN over a whole input sequence (vanilla RNN).
Fused RNN over a whole input sequence (vanilla RNN).
- Unrolls the sequence in a single Function.
- Returns the last hidden state as the output Variabl.
- On backward (), performs full BPTT and accumulates parameter grads. Shapes: input(t): (B, I, 1) hidden: (B, H, 1) // initial hidden (h0) W_ih: (1, H, I) W_hh: (1, H, H) b_ih: (1, H, 1) b_hh: (1, H, 1)
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
The RNNTestCore object defines a suite of @main entrypoints that exercise the autograd system using recurrent neural network components. These tests verify:
The RNNTestCore object defines a suite of @main entrypoints that exercise the autograd system using recurrent neural network components. These tests verify:
- forward computation consistency for RNNCell and GRUCell
- correct propagation of hidden states through
RNNBase - correctness of gradient backpropagation through time
- multilayer RNN/GRU behavior and parameter interaction
- construction and export of autograd computation graphs for debugging All tests use synthetic inputs and manually assigned weights/biases to ensure deterministic behavior to validate against PyTorch, enabling reliable gradient-checking via finite differences using
GradCheck.gradCheck.
Attributes
- Note
-
This file focuses exclusively on core autograd correctness and does not contain any real-data forecasting experiments.
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
RNNTestCore.type
The RNNTestForecasting object provides a suite of time–series utilities and forecasting experiments using Autograd–based recurrent neural networks. It includes:
The RNNTestForecasting object provides a suite of time–series utilities and forecasting experiments using Autograd–based recurrent neural networks. It includes:
- lagged–window matrix builders (
buildMatrix4TS,buildMatrix4TSX) - batch construction utilities for sequence models (
makeBatches) - demonstration tests for RNN and GRU models on: • synthetic sequences • COVID–19 new-deaths data • ILI (Influenza-Like Illness) data
- chronological train/test splits
- rolling / walk–forward validation These tests verify correctness of data pipelines, shape handling, training loops, scaling transformations, and forecasting performance.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
RNNTestForecasting.type
Applies the ReLU activation function.
Applies the ReLU activation function.
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Computes the reciprocal of a variable.
Computes the reciprocal of a variable.
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
PyTorch-style ReduceLROnPlateau scheduler. Monitors a metric each epoch and reduces the learning rate when progress plateaus. Supports both "min" (e.g., loss) and "max" (e.g., accuracy) modes, with relative or absolute thresholds for determining improvement. The LR is reduced when the number of non-improving epochs exceeds patience, after which a cooldown period prevents further reductions. Each reduction follows: newLR = max(oldLR * factor, minLR), skipped when the change is too small (≤ eps). Non-finite metric values are ignored. Call step(metric) after each optimizer update. getLastLR returns the most recent learning rate.
PyTorch-style ReduceLROnPlateau scheduler. Monitors a metric each epoch and reduces the learning rate when progress plateaus. Supports both "min" (e.g., loss) and "max" (e.g., accuracy) modes, with relative or absolute thresholds for determining improvement. The LR is reduced when the number of non-improving epochs exceeds patience, after which a cooldown period prevents further reductions. Each reduction follows: newLR = max(oldLR * factor, minLR), skipped when the change is too small (≤ eps). Non-finite metric values are ignored. Call step(metric) after each optimizer update. getLastLR returns the most recent learning rate.
Value parameters
- cooldown
-
epochs to wait after a reduction during which bad-epoch counter stays at 0
- eps
-
minimal effective LR change required to apply a reduction
- factor
-
multiplicative decay factor in (0,1), i.e., newLR = oldLR * factor
- minLR
-
lower bound on the learning rate
- mode
-
"min" or "max" (target direction for improvement)
- optim
-
the optimizer whose learning rate will be scheduled
- patience
-
number of non-improving epochs tolerated before reduction (strictly
> patience) - threshold
-
significance threshold (relative or absolute depending on
thresholdMode) - thresholdMode
-
"rel" for relative margin, "abs" for absolute margin
- verbose
-
if true, prints LR reduction messages
Attributes
- Supertypes
Reshape operation for a variable. This class represents a differentiable operation that reshapes a tensor variable to a new shape during the forward pass and reshapes the gradient back to the original shape during the backward pass.
Reshape operation for a variable. This class represents a differentiable operation that reshapes a tensor variable to a new shape during the forward pass and reshapes the gradient back to the original shape during the backward pass.
Value parameters
- newShape
-
the target shape for the variable
- v
-
the input variable to be reshaped
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Computes the round of a variable (element-wise).
Computes the round of a variable (element-wise).
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Implements the Stochastic Gradient Descent (SGD) optimization algorithm.
Implements the Stochastic Gradient Descent (SGD) optimization algorithm.
Value parameters
- lr
-
the learning rate used for updating the parameters.
- momentum
-
momentum factor to accelerate convergence (default is 0.0).
- parameters
-
an indexed sequence of model parameters to be optimized.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Optimizerclass Objecttrait Matchableclass AnyShow all
Computes the Sum of Squared Errors (SSE) loss.
Computes the Sum of Squared Errors (SSE) loss.
Value parameters
- pred
-
the prediction variable.
- target
-
the target variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Implements the Scaled Dot-Product Attention mechanism. This class is a sequence module that computes the attention scores and applies them to the value tensor (v) based on the query (q) and key (k) tensors. It is a fundamental building block for transformer models.
Implements the Scaled Dot-Product Attention mechanism. This class is a sequence module that computes the attention scores and applies them to the value tensor (v) based on the query (q) and key (k) tensors. It is a fundamental building block for transformer models.
Attributes
- See also
-
https://arxiv.org/abs/1706.03762 "Attention Is All You Need" by Vaswani et al., 2017.
- Supertypes
Module for layers that take multiple inputs (e.g., RNN cells, attention blocks). Defines the abstract forward function for sequence or multiple inputs.
Module for layers that take multiple inputs (e.g., RNN cells, attention blocks). Defines the abstract forward function for sequence or multiple inputs.
Value parameters
- localParameters
-
the parameters (Variables) directly belonging to this module
Attributes
- Supertypes
- Known subtypes
-
class MultiHeadAttention
Applies the sigmoid activation function.
Applies the sigmoid activation function.
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Applies the sign function element-wise. Derivative is zero almost everywhere (undefined at zero).
Applies the sign function element-wise. Derivative is zero almost everywhere (undefined at zero).
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Represents a slicing operation on a tensor variable. This class performs a differentiable slicing operation during the forward pass and propagates the gradient to the sliced region during the backward pass.
Represents a slicing operation on a tensor variable. This class performs a differentiable slicing operation during the forward pass and propagates the gradient to the sliced region during the backward pass.
Value parameters
- r0
-
the range for the first dimension
- r1
-
the range for the second dimension
- r2
-
the range for the third dimension
- v
-
the input variable to be sliced
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Applies the softmax activation function.
Applies the softmax activation function.
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Computes the square root of a variable.
Computes the square root of a variable.
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Enumeration representing the status of a test: - Passed - Failed
Enumeration representing the status of a test: - Passed - Failed
Attributes
- Supertypes
-
trait Enumtrait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Computes the standard deviation of all elements in a variable. Std(x) = sqrt(Var(x)); derivative ds/dx = (x - mean)/(N * std).
Computes the standard deviation of all elements in a variable. Std(x) = sqrt(Var(x)); derivative ds/dx = (x - mean)/(N * std).
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Computes the standard deviation of a variable along a specified axis.
Computes the standard deviation of a variable along a specified axis.
Value parameters
- axis
-
the axis along which to compute std.
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Step-based learning rate scheduler. Reduces the optimizer's learning rate by multiplying with gamma every stepSize epochs. Matches the behavior of PyTorch's StepLR for the single-LR (non–param-group) setting.
Step-based learning rate scheduler. Reduces the optimizer's learning rate by multiplying with gamma every stepSize epochs. Matches the behavior of PyTorch's StepLR for the single-LR (non–param-group) setting.
Value parameters
- gamma
-
the multiplicative decay factor applied every step
- optim
-
the optimizer whose learning rate will be scheduled
- stepSize
-
the interval (in epochs) between LR reductions
Attributes
- Supertypes
Computes element-wise subtraction of two variables.
Computes element-wise subtraction of two variables.
Value parameters
- v1
-
the minuend.
- v2
-
the subtrahend.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Subtracts a constant value from a variable.
Subtracts a constant value from a variable.
Value parameters
- d
-
the constant to subtract.
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Computes the sum of all elements in a variable.
Computes the sum of all elements in a variable.
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Applies the tanh activation function.
Applies the tanh activation function.
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
The TensorInitializers utility object for tensor initializations commonly used in neural networks. Provides methods to create tensors filled with zeros, ones, random values, and standardized initialization schemes like He and Xavier initialization. All returned tensors have batch-first shape: (batch, rows, cols).
The TensorInitializers utility object for tensor initializations commonly used in neural networks. Provides methods to create tensors filled with zeros, ones, random values, and standardized initialization schemes like He and Xavier initialization. All returned tensors have batch-first shape: (batch, rows, cols).
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
TensorInitializers.type
Companion object for creating TestReport instances.
Companion object for creating TestReport instances.
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
TestReport.type
A test report utility for recording and summarizing test results. Stores a collection of TestResult objects and provides support for timing tests, capturing failures, and printing formatted summary reports.
A test report utility for recording and summarizing test results. Stores a collection of TestResult objects and provides support for timing tests, capturing failures, and printing formatted summary reports.
Attributes
- Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
Result container for a single test execution.
Result container for a single test execution.
Value parameters
- ms
-
the execution time in milliseconds
- name
-
the name of the test
- note
-
optional note or error message (default empty)
- status
-
the status of the test (Passed or Failed)
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
The TransformerEnc object implements the attention method based on the scaled dot product.
The TransformerEnc object implements the attention method based on the scaled dot product.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
TransformerEnc.type
The TransformerTestCoretests theTransformer` class.
The TransformerTestCoretests theTransformer` class.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
TransformerTestCore.type
Transposes (swaps) two axes of a tensor variable.
Transposes (swaps) two axes of a tensor variable.
Value parameters
- i
-
first axis index.
- j
-
second axis index.
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
The Variabl case class represents a tensor with automatic differentiation capability. It tracks operations applied to it for backward gradient propagation. Variabls can be combined using arithmetic operations, activation functions, and loss functions. Backpropagation is triggered via the backward method.
The Variabl case class represents a tensor with automatic differentiation capability. It tracks operations applied to it for backward gradient propagation. Variabls can be combined using arithmetic operations, activation functions, and loss functions. Backpropagation is triggered via the backward method.
Value parameters
- data
-
the tensor data for this variable.
- gradFn
-
an optional function for backpropagation.
- name
-
an optional name for this variable.
- ops
-
the implicit autograd operations for tensor computations.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Computes the variance of all elements in a variable (population variance). Uses definition Var(x) = mean((x - mean(x))^2).
Computes the variance of all elements in a variable (population variance). Uses definition Var(x) = mean((x - mean(x))^2).
Value parameters
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Computes the variance of a variable along a specified axis (population variance).
Computes the variance of a variable along a specified axis (population variance).
Value parameters
- axis
-
the axis along which to compute variance.
- v
-
the input variable.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait Functionclass Objecttrait Matchableclass AnyShow all
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Provides an implicit conversion from a Module to a function that maps a Variabl to a Variabl. This allows using a Module directly as a function.
Provides an implicit conversion from a Module to a function that maps a Variabl to a Variabl. This allows using a Module directly as a function.
Attributes
- Supertypes
- Self type
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Value members
Concrete methods
Concatenates a sequence of variables along the specified axis.
Concatenates a sequence of variables along the specified axis.
Value parameters
- axis
-
the axis along which to concatenate.
- vars
-
the sequence of variables to concatenate.
Attributes
- Returns
-
a new variable representing the concatenated result.
Computes the Exponential Linear Unit (ELU) activation for the input variable.
Computes the Exponential Linear Unit (ELU) activation for the input variable.
Value parameters
- alpha
-
the ELU scaling parameter, default is 1.0.
- v
-
the input variable.
Attributes
- Returns
-
a new variable after applying ELU.
Computes the exponential (exp) of the input variable.
Computes the exponential (exp) of the input variable.
Value parameters
- v
-
the input variable.
Attributes
- Returns
-
a new variable after applying the exponential function.
Computes the Gaussian Error Linear Unit (GeLU) activation for the input variable.
Computes the Gaussian Error Linear Unit (GeLU) activation for the input variable.
Value parameters
- v
-
the input variable.
Attributes
- Returns
-
a new variable after applying GeLU.
Computes the Leaky ReLU activation for the input variable.
Computes the Leaky ReLU activation for the input variable.
Value parameters
- alpha
-
the slope for negative inputs, default is 0.01.
- v
-
the input variable.
Attributes
- Returns
-
a new variable after applying Leaky ReLU.
Computes the Mean Absolute Error (MAE) loss between two variables.
Computes the Mean Absolute Error (MAE) loss between two variables.
Value parameters
- x
-
the predictions variable.
- y
-
the target variable.
Attributes
- Returns
-
a variable representing the computed MAE loss.
Computes the Mean Squared Error (MSE) loss between two variables.
Computes the Mean Squared Error (MSE) loss between two variables.
Value parameters
- x
-
the predictions variable.
- y
-
the target variable.
Attributes
- Returns
-
a variable representing the computed MSE loss.
Computes the Rectified Linear Unit (ReLU) activation for the input variable.
Computes the Rectified Linear Unit (ReLU) activation for the input variable.
Value parameters
- v
-
the input variable.
Attributes
- Returns
-
a new variable after applying ReLU.
Computes the Sigmoid activation for the input variable.
Computes the Sigmoid activation for the input variable.
Value parameters
- v
-
the input variable.
Attributes
- Returns
-
a new variable after applying sigmoid.
Slices a variable along its three dimensions using the specified ranges.
Slices a variable along its three dimensions using the specified ranges.
Value parameters
- a
-
the range for the first dimension.
- b
-
the range for the second dimension.
- c
-
the range for the third dimension.
- v
-
the variable to slice.
Attributes
- Returns
-
a new variable representing the sliced result.
Computes the softmax activation for the input variable.
Computes the softmax activation for the input variable.
Value parameters
- v
-
the input variable.
Attributes
- Returns
-
a new variable after applying softmax.
Computes the Sum of Squared Error (SSE) loss between two variables.
Computes the Sum of Squared Error (SSE) loss between two variables.
Value parameters
- x
-
the predictions variable.
- y
-
the target variable.
Attributes
- Returns
-
a variable representing the computed SSE loss.
Computes the hyperbolic tangent (tanh) activation for the input variable.
Computes the hyperbolic tangent (tanh) activation for the input variable.
Value parameters
- v
-
the input variable.
Attributes
- Returns
-
a new variable after applying tanh.
The transformerEnc1 main function illustrates the calculation of attention (Q, K, V) for a Single Head as used in a Transformer. SEE LINK BELOW FOR MORE DETAILS.
The transformerEnc1 main function illustrates the calculation of attention (Q, K, V) for a Single Head as used in a Transformer. SEE LINK BELOW FOR MORE DETAILS.
Attributes
- See also
-
pub.aimind.so/transformer-model-and-variants-of-transformer-chatgpt-3d423676e29c (URL)
runMain scalation.modeling.forecasting.neuralforecasting.transformerEnc1
The transformerEnc2 main function illustrates the steps in an "Encoder-Only Transformer" consisting of a single encoder block with a "Prediction Head" added for making forecasts.
The transformerEnc2 main function illustrates the steps in an "Encoder-Only Transformer" consisting of a single encoder block with a "Prediction Head" added for making forecasts.
runMain scalation.modeling.forecasting.neuralforecasting.transformerEnc2
Attributes
Givens
Givens
Provides an implicit conversion from a Module to a function that maps a Variabl to a Variabl. This allows using a Module directly as a function.
Provides an implicit conversion from a Module to a function that maps a Variabl to a Variabl. This allows using a Module directly as a function.