Optimizer

scalation.modeling.autograd.Optimizer
abstract class Optimizer(parameters: IndexedSeq[Variabl], var learningRate: Double)

The Optimizer abstract class optimizes model parameters. Notes: - Subclasses implement the specific update rule in step(). - The optimizer assumes that gradients (p.grad) have been computed and accumulated by the autograd engine before each call to step(). - Parameters with null gradients are safely ignored.

Value parameters

learningRate

the step size (η) used for gradient-based updates

parameters

the trainable parameters, each wrapped in a Variabl

Attributes

Graph
Supertypes
class Object
trait Matchable
class Any
Known subtypes
class Adam
class SGD

Members list

Value members

Abstract methods

def step(): Unit

Executes a single optimization step by updating each parameter based on its gradient.

Executes a single optimization step by updating each parameter based on its gradient.

Attributes

Concrete methods

def clipGradNorm(maxNorm: Double): Unit

Clip the gradients of all parameters by global norm. Scales gradients so that the total norm ≤ maxNorm. Math: Let g = √(∑_p ‖grad_p‖² ). If g > maxNorm, scale all gradients by (maxNorm / g).

Clip the gradients of all parameters by global norm. Scales gradients so that the total norm ≤ maxNorm. Math: Let g = √(∑_p ‖grad_p‖² ). If g > maxNorm, scale all gradients by (maxNorm / g).

Attributes

def clipGradValue(minVal: Double, maxVal: Double): Unit

Clip the gradients of all parameters by value (element-wise). Each gradient entry smaller than minVal is set to minVal, and each entry larger than maxVal is set to maxVal.

Clip the gradients of all parameters by value (element-wise). Each gradient entry smaller than minVal is set to minVal, and each entry larger than maxVal is set to maxVal.

Attributes

def gradNorm: Double

Compute the global L2 norm of all parameter gradients. Math: g = √(∑_p‖grad_p‖² )

Compute the global L2 norm of all parameter gradients. Math: g = √(∑_p‖grad_p‖² )

Attributes

def zeroGrad(): Unit

Reset gradients for all parameters. Typically called before the next forward/backward pass. Only parameters with non-null gradient buffers are updated.

Reset gradients for all parameters. Typically called before the next forward/backward pass. Only parameters with non-null gradient buffers are updated.

Attributes

Concrete fields

var learningRate: Double