SimpleEncoder
The SimpleEncoder object implements the attention method based on the scaled dot product.
Attributes
- Graph
-
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
SimpleEncoder.type
Members list
Value members
Concrete methods
Based on the Query (Q), Key (K), and Value (V) matrices, compute the attention.
Based on the Query (Q), Key (K), and Value (V) matrices, compute the attention.
att = softmax (QK^T/√d_k) V
Value parameters
- d_k
-
the dimensionality of Query, Key, and Value (if different use d_v)
- k
-
the Key: other locations to compare it with (for similarity)
- q
-
the Query: the input of interest
- v
-
the Value: the input value at the key locations
Attributes
Use a matrix transformation containing learnable weights to embed each patch vector into a higher dimensional space (providing enhanced vector similarity). The dimensionality of the embedding space is d_model. For this simple implementation d_model = d_k as there is only one attention head.
Use a matrix transformation containing learnable weights to embed each patch vector into a higher dimensional space (providing enhanced vector similarity). The dimensionality of the embedding space is d_model. For this simple implementation d_model = d_k as there is only one attention head.
Value parameters
- wE
-
the dimensionality of the embedding space
- xx
-
the matrix containing each patch as a row
Attributes
Encode all the positions in the time series as vectors of length d_model.
Encode all the positions in the time series as vectors of length d_model.
Value parameters
- d_k
-
the dimensionality of the model (d_model = d_k here)
- len
-
the sequence length
Attributes
Perform layer normalization on matrix x. While batch normalization normalizes over a mini-batch, layer normalization normalizes over a single instance or row.
Perform layer normalization on matrix x. While batch normalization normalizes over a mini-batch, layer normalization normalizes over a single instance or row.
Value parameters
- x
-
the matrix to normalize
- β
-
the shifting learnable parameter (defaults to 0.0)
- γ
-
the scaling learnable parameter (defaults to 1.0)
Attributes
- See also
-
www.geeksforgeeks.org/deep-learning/what-is-layer-normalization/ An affine transformation is supported via the γ and β learnable parameters.
Patchify the univariate time series y by breaking it into non-overlapping patches of length pl. This simple implementation assumes stride s = pl, but PatchTST uses pl = 16 and s = 8 as defaults.
Patchify the univariate time series y by breaking it into non-overlapping patches of length pl. This simple implementation assumes stride s = pl, but PatchTST uses pl = 16 and s = 8 as defaults.
Value parameters
- pl
-
the patch length
- y
-
the given univariate time series