These days, in deep learning, it is usual to hear about transformers’ outstanding performance on the challenges where other algorithms can not meet our expectations when most of them are based on attention. This article gives you a detailed illustration of the code and mathematics of the four most-used types of attention in the Deep Learning era. — The main feature of Attentions is the fact that their work is not limited to locality like CNNS, etc. but, we will see that in some cases we’d need the model to contemplate locality, etc. in their training process.