abstract class Optimizer(parameters: IndexedSeq[Variabl], var learningRate: Double)
The Optimizer abstract class optimizes model parameters. Notes: - Subclasses implement the specific update rule in step(). - The optimizer assumes that gradients (p.grad) have been computed and accumulated by the autograd engine before each call to step(). - Parameters with null gradients are safely ignored.
Value parameters
learningRate
the step size (η) used for gradient-based updates
parameters
the trainable parameters, each wrapped in a Variabl
Clip the gradients of all parameters by global norm. Scales gradients so that the total norm ≤ maxNorm. Math: Let g = √(∑_p ‖grad_p‖² ). If g > maxNorm, scale all gradients by (maxNorm / g).
Clip the gradients of all parameters by global norm. Scales gradients so that the total norm ≤ maxNorm. Math: Let g = √(∑_p ‖grad_p‖² ). If g > maxNorm, scale all gradients by (maxNorm / g).
Attributes
def clipGradValue(minVal: Double, maxVal: Double): Unit
Clip the gradients of all parameters by value (element-wise). Each gradient entry smaller than minVal is set to minVal, and each entry larger than maxVal is set to maxVal.
Clip the gradients of all parameters by value (element-wise). Each gradient entry smaller than minVal is set to minVal, and each entry larger than maxVal is set to maxVal.
Reset gradients for all parameters. Typically called before the next forward/backward pass. Only parameters with non-null gradient buffers are updated.
Reset gradients for all parameters. Typically called before the next forward/backward pass. Only parameters with non-null gradient buffers are updated.