Multihead Attention allows the model to jointly attend to information from different representation subspaces at different positions. More...
Public Member Functions | |
MultiheadAttention () | |
Default constructor. More... | |
MultiheadAttention (const size_t tgtSeqLen, const size_t srcSeqLen, const size_t embedDim, const size_t numHeads) | |
Create the MultiheadAttention object using the specified modules. More... | |
OutputDataType const & | AttentionMask () const |
Get the two dimensional Attention Mask. More... | |
OutputDataType & | AttentionMask () |
Modify the two dimensional Attention Mask. More... | |
template | |
void | Backward (const arma::Mat< eT > &, const arma::Mat< eT > &gy, arma::Mat< eT > &g) |
Ordinary feed backward pass of a neural network, calculating the function f(x) by propagating x backwards trough f. More... | |
OutputDataType const & | Delta () const |
Get the delta. More... | |
OutputDataType & | Delta () |
Modify the delta. More... | |
size_t | EmbedDim () const |
Get the embedding dimension. More... | |
size_t & | EmbedDim () |
Modify the embedding dimension. More... | |
template | |
void | Forward (const arma::Mat< eT > &input, arma::Mat< eT > &output) |
Ordinary feed forward pass of a neural network, evaluating the function f(x) by propagating the activity forward through f. More... | |
template | |
void | Gradient (const arma::Mat< eT > &input, const arma::Mat< eT > &error, arma::Mat< eT > &gradient) |
Calculate the gradient using the output delta and the input activation. More... | |
OutputDataType const & | Gradient () const |
Get the gradient. More... | |
OutputDataType & | Gradient () |
Modify the gradient. More... | |
OutputDataType const & | KeyPaddingMask () const |
Get Key Padding Mask. More... | |
OutputDataType & | KeyPaddingMask () |
Modify the Key Padding Mask. More... | |
size_t | NumHeads () const |
Get the number of attention heads. More... | |
size_t & | NumHeads () |
Modify the number of attention heads. More... | |
OutputDataType const & | OutputParameter () const |
Get the output parameter. More... | |
OutputDataType & | OutputParameter () |
Modify the output parameter. More... | |
OutputDataType const & | Parameters () const |
Get the parameters. More... | |
OutputDataType & | Parameters () |
Modify the parameters. More... | |
void | Reset () |
Reset the layer parameters. More... | |
template | |
void | serialize (Archive &ar, const unsigned int) |
Serialize the layer. More... | |
size_t | SrcSeqLen () const |
Get the source sequence length. More... | |
size_t & | SrcSeqLen () |
Modify the source sequence length. More... | |
size_t | TgtSeqLen () const |
Get the target sequence length. More... | |
size_t & | TgtSeqLen () |
Modify the target sequence length. More... | |
Multihead Attention allows the model to jointly attend to information from different representation subspaces at different positions.
With a single attention head, averaging inhibits this. [arxiv.org:1706.03762v5]
The MultiheadAttention class takes concatenated form of query, key and value. The query, key and value are concatenated into single matrix and fed to the Forward function as input.
The query, key and value are matrices of shapes (embedDim * tgtSeqLen, batchSize)
, (embedDim * srcSeqLen, batchSize)
and (embedDim * srcSeqLen, batchSize)
respectively. The output is a matrix of shape (embedDim * tgtSeqLen, batchSize)
. The embeddings are stored consequently.
InputDataType | Type of the input data (arma::colvec, arma::mat, arma::sp_mat or arma::cube). |
OutputDataType | Type of the output data (arma::colvec, arma::mat, arma::sp_mat or arma::cube). |
RegularizerType | Type of the regularizer to be used. |
Definition at line 120 of file layer_types.hpp.
Default constructor.
MultiheadAttention | ( | const size_t | tgtSeqLen, |
const size_t | srcSeqLen, | ||
const size_t | embedDim, | ||
const size_t | numHeads | ||
) |
Create the MultiheadAttention object using the specified modules.
tgtSeqLen | Target sequence length. |
srcSeqLen | Source sequence length. |
embedDim | Total dimension of the model. |
numHeads | Number of parallel attention heads. |
|
inline |
Get the two dimensional Attention Mask.
Definition at line 150 of file multihead_attention.hpp.
|
inline |
Modify the two dimensional Attention Mask.
Definition at line 152 of file multihead_attention.hpp.
void Backward | ( | const arma::Mat< eT > & | , |
const arma::Mat< eT > & | gy, | ||
arma::Mat< eT > & | g | ||
) |
Ordinary feed backward pass of a neural network, calculating the function f(x) by propagating x backwards trough f.
Using the results from the feed forward pass.
gy | The backpropagated error. |
g | The calculated gradient. |
|
inline |
Get the delta.
Definition at line 165 of file multihead_attention.hpp.
|
inline |
Modify the delta.
Definition at line 167 of file multihead_attention.hpp.
|
inline |
Get the embedding dimension.
Definition at line 140 of file multihead_attention.hpp.
|
inline |
Modify the embedding dimension.
Definition at line 142 of file multihead_attention.hpp.
void Forward | ( | const arma::Mat< eT > & | input, |
arma::Mat< eT > & | output | ||
) |
Ordinary feed forward pass of a neural network, evaluating the function f(x) by propagating the activity forward through f.
input | The query matrix. |
output | Resulting output activation. |
void Gradient | ( | const arma::Mat< eT > & | input, |
const arma::Mat< eT > & | error, | ||
arma::Mat< eT > & | gradient | ||
) |
Calculate the gradient using the output delta and the input activation.
input | The input data used for evaluating specified function. |
error | The calculated error. |
gradient | The calculated gradient. |
|
inline |
Get the gradient.
Definition at line 170 of file multihead_attention.hpp.
|
inline |
Modify the gradient.
Definition at line 172 of file multihead_attention.hpp.
|
inline |
Get Key Padding Mask.
Definition at line 155 of file multihead_attention.hpp.
|
inline |
Modify the Key Padding Mask.
Definition at line 157 of file multihead_attention.hpp.
|
inline |
Get the number of attention heads.
Definition at line 145 of file multihead_attention.hpp.
|
inline |
Modify the number of attention heads.
Definition at line 147 of file multihead_attention.hpp.
|
inline |
Get the output parameter.
Definition at line 160 of file multihead_attention.hpp.
|
inline |
Modify the output parameter.
Definition at line 162 of file multihead_attention.hpp.
|
inline |
Get the parameters.
Definition at line 175 of file multihead_attention.hpp.
|
inline |
Modify the parameters.
Definition at line 177 of file multihead_attention.hpp.
void Reset | ( | ) |
Reset the layer parameters.
void serialize | ( | Archive & | ar, |
const unsigned | int | ||
) |
Serialize the layer.
|
inline |
Get the source sequence length.
Definition at line 135 of file multihead_attention.hpp.
|
inline |
Modify the source sequence length.
Definition at line 137 of file multihead_attention.hpp.
|
inline |
Get the target sequence length.
Definition at line 130 of file multihead_attention.hpp.
|
inline |
Modify the target sequence length.
Definition at line 132 of file multihead_attention.hpp.