TransformerEnc
The TransformerEnc object implements the attention method based on the scaled dot product.
Attributes
- Graph
-
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
TransformerEnc.type
Members list
Value members
Concrete methods
Based on the Query (Q), Key (K), and Value (V) matrices, compute the attention.
Based on the Query (Q), Key (K), and Value (V) matrices, compute the attention.
att = softmax (QK^ᵀ/√d_k) V
Value parameters
- d_k
-
the dimensionality of Query, Key, and Value (if different use d_v)
- k
-
the Key: other locations to compare it with (for similarity)
- q
-
the Query: the input of interest
- v
-
the Value: the input value at the key locations
Attributes
Use a matrix transformation containing learnable weights to embed each patch vector into a higher dimensional space (providing enhanced vector similarity). The dimensionality of the embedding space is d_model. For this simple implementation d_model = d_k as there is only one attention head.
Use a matrix transformation containing learnable weights to embed each patch vector into a higher dimensional space (providing enhanced vector similarity). The dimensionality of the embedding space is d_model. For this simple implementation d_model = d_k as there is only one attention head.
Value parameters
- wE
-
the dimensionality of the embedding space
- xx
-
the matrix containing each patch as a row
Attributes
Encode all the positions in the time series as vectors of length d_model.
Encode all the positions in the time series as vectors of length d_model.
Value parameters
- d_k
-
the dimensionality of the model (d_model = d_k here)
- len
-
the sequence length
Attributes
Perform layer normalization on matrix x. The more general affine transformation is not supported in this simple implementation.
Perform layer normalization on matrix x. The more general affine transformation is not supported in this simple implementation.
Value parameters
- x
-
the matrix to normalize
Attributes
Patchify the univariate time series y by breaking it into non-overlapping patches of length pl. This simple implementation assumes stride s = pl, but PatchTST uses pl = 16 and s = 8 as defaults.
Patchify the univariate time series y by breaking it into non-overlapping patches of length pl. This simple implementation assumes stride s = pl, but PatchTST uses pl = 16 and s = 8 as defaults.
Value parameters
- pl
-
the patch length
- y
-
the given univariate time series