Thumb ticker lg dscf8503

Antonio Orvieto

  • PI

Researching the Principles of Efficient Training of Deep Learning Models

Antonio studied Robotics and Control Engineering in Italy and Switzerland. He holds a PhD from ETH Zürich and spend time at Deepmind (UK), Meta (US), MILA (CA), INRIA (FR) and HILTI (LI).

In his research, Antonio strives to improve the efficiency of deep learning technologies by pioneering new architectures and training techniques grounded in theoretical knowledge. His work encompasses two main areas: understanding the intricacies of large-scale optimization dynamics and designing innovative architectures and powerful optimizers capable of handling complex data. Central to his studies is exploring innovative techniques for decoding patterns in sequential data, with implications in biology, neuroscience, natural language processing, and music generation.

For more information, please refer to Antonio's personal website and his Google Scholar


Selected Publications (full list here):

  • Resurrecting Recurrent Neural Networks for Long Sequences (ICML 2023 Oral)
  • Anticorrelated Noise Injection for Improved Generalization (ICML 2022)
  • Signal Propagation in Transformers: Theoretical Perspectives (NeurIPS 2022)
  • Continuous-time Models for Stochastic Optimization Algorithms (NeurIPS 2019)