Adam

scalation.modeling.autograd.Adam
case class Adam(parameters: IndexedSeq[Variabl], lr: Double = ..., beta1: Double = ..., beta2: Double = ..., weightDecay: Double = ..., eps: Double = ...) extends Optimizer

The Adam class implements the Adam optimization algorithm for updating model parameters. The Adam optimizer (Kingma & Ba, 2015) with optional L2 weight decay maintains first (m) and second (v) moment estimates and applies bias correction. Classical (non-decoupled) weight decay is applied by adding weightDecay * param to the raw gradient.

Value parameters

beta1

exponential decay rate for the first moment estimates.

beta2

exponential decay rate for the second moment estimates.

eps

small constant added for numerical stability.

lr

base Learning rate for updating the parameters.

parameters

indexed sequence of Variables representing model parameters.

weightDecay

L2 regularization coefficient (0.0 to disable)

Attributes

See also
Note

Call zeroGrad() before backward + step.

Graph
Supertypes
trait Serializable
trait Product
trait Equals
class Optimizer
class Object
trait Matchable
class Any
Show all

Members list

Value members

Concrete methods

override def step(): Unit

Performs a single optimization step using the Adam algorithm. The step method increments the time step counter, then for each parameter:

Performs a single optimization step using the Adam algorithm. The step method increments the time step counter, then for each parameter:

  • Updates the biased first moment estimate.
  • Updates the biased second moment estimate.
  • Computes bias-corrected moment estimates.
  • Updates the parameter data using the computed moments.

Attributes

Definition Classes

Inherited methods

def clipGradNorm(maxNorm: Double): Unit

Clip the gradients of all parameters by global norm. Scales gradients so that the total norm ≤ maxNorm. Math: Let g = √(∑_p ‖grad_p‖² ). If g > maxNorm, scale all gradients by (maxNorm / g).

Clip the gradients of all parameters by global norm. Scales gradients so that the total norm ≤ maxNorm. Math: Let g = √(∑_p ‖grad_p‖² ). If g > maxNorm, scale all gradients by (maxNorm / g).

Attributes

Inherited from:
Optimizer
def clipGradValue(minVal: Double, maxVal: Double): Unit

Clip the gradients of all parameters by value (element-wise). Each gradient entry smaller than minVal is set to minVal, and each entry larger than maxVal is set to maxVal.

Clip the gradients of all parameters by value (element-wise). Each gradient entry smaller than minVal is set to minVal, and each entry larger than maxVal is set to maxVal.

Attributes

Inherited from:
Optimizer
def gradNorm: Double

Compute the global L2 norm of all parameter gradients. Math: g = √(∑_p‖grad_p‖² )

Compute the global L2 norm of all parameter gradients. Math: g = √(∑_p‖grad_p‖² )

Attributes

Inherited from:
Optimizer
def productElementNames: Iterator[String]

An iterator over the names of all the elements of this product.

An iterator over the names of all the elements of this product.

Attributes

Inherited from:
Product
def productIterator: Iterator[Any]

An iterator over all the elements of this product.

An iterator over all the elements of this product.

Attributes

Returns

in the default implementation, an Iterator[Any]

Inherited from:
Product
def zeroGrad(): Unit

Reset gradients for all parameters. Typically called before the next forward/backward pass. Only parameters with non-null gradient buffers are updated.

Reset gradients for all parameters. Typically called before the next forward/backward pass. Only parameters with non-null gradient buffers are updated.

Attributes

Inherited from:
Optimizer

Inherited fields

var learningRate: Double

Attributes

Inherited from:
Optimizer