scalation/scalation/scalation.modeling/scalation.modeling.autograd/Adam

Adam

scalation.modeling.autograd.Adam

case class Adam(parameters: IndexedSeq[Variabl], lr: Double = ..., beta1: Double = ..., beta2: Double = ..., weightDecay: Double = ..., eps: Double = ...) extends Optimizer

The Adam class implements the Adam optimization algorithm for updating model parameters. The Adam optimizer (Kingma & Ba, 2015) with optional L2 weight decay maintains first (m) and second (v) moment estimates and applies bias correction. Classical (non-decoupled) weight decay is applied by adding weightDecay * param to the raw gradient.

Value parameters

beta1: exponential decay rate for the first moment estimates.
beta2: exponential decay rate for the second moment estimates.
eps: small constant added for numerical stability.
lr: base Learning rate for updating the parameters.
parameters: indexed sequence of Variables representing model parameters.
weightDecay: L2 regularization coefficient (0.0 to disable)

Attributes

See also: https://arxiv.org/abs/1412.6980
Note: Call zeroGrad() before backward + step.
Graph
Supertypes: trait Serializable

trait Product

trait Equals

class Optimizer

class Object

trait Matchable

class Any
Show all

Members list

Value members

Concrete methods

Performs a single optimization step using the Adam algorithm. The step method increments the time step counter, then for each parameter:

Updates the biased first moment estimate.
Updates the biased second moment estimate.
Computes bias-corrected moment estimates.
Updates the parameter data using the computed moments.

Attributes

Definition Classes: Optimizer

Inherited methods

Clip the gradients of all parameters by global norm. Scales gradients so that the total norm ≤ maxNorm. Math: Let g = √(∑_p ‖grad_p‖² ). If g > maxNorm, scale all gradients by (maxNorm / g).

Attributes

Inherited from:: Optimizer

Clip the gradients of all parameters by value (element-wise). Each gradient entry smaller than minVal is set to minVal, and each entry larger than maxVal is set to maxVal.

Attributes

Inherited from:: Optimizer

Compute the global L2 norm of all parameter gradients. Math: g = √(∑_p‖grad_p‖² )

Attributes

Inherited from:: Optimizer

An iterator over the names of all the elements of this product.

Attributes

Inherited from:: Product

An iterator over all the elements of this product.

Attributes

Returns: in the default implementation, an Iterator[Any]
Inherited from:: Product

Reset gradients for all parameters. Typically called before the next forward/backward pass. Only parameters with non-null gradient buffers are updated.

Attributes

Inherited from:: Optimizer

Inherited fields

Attributes

Inherited from:: Optimizer

In this article

Generated with