CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings
Without positional information, attention-based Transformer neural networks are permutation-invariant. Absolute or relative positional embeddings are the most popular ways to feed Transformer models with positional information. Absolute positional embeddings are simple to implement, but suffer from generalization issues when evaluating on sequences longer than seen at training time. Relative positions are more robust to input length change, but are more complex to implement and yield inferior model throughput due to extra computational and memory costs. In this talk, we will discuss an augmentation-based approach (CAPE) for absolute positional embeddings, which keeps the advantages of both absolute (simplicity and speed) and relative positional embeddings (better generalization).
Tatiana is a Research Scientist in the Machine Learning Research group, Apple. Prior to Apple, Tatiana was an AI Resident and later a Postdoctoral Research Scientist in speech recognition team, Facebook AI Research. Tatiana received her PhD in mixed type partial differential equations from Moscow State University in 2017. For 4 years she worked on applications of Machine Learning to High Energy Physics as a Research Scientist in the joint lab at Yandex and CERN, and later at the startup NTechLab, a leader in face recognition. The main focus of her recent research is transformers generalization and speech recognition (semi-, weakly- and unsupervised learning, domain transfer and robustness).