Researching the Principles of Efficient Training of Deep Learning Models with New Capabilities
Antonio studied Robotics and Control Engineering in Italy and Switzerland. He holds a PhD from ETH Zürich and spend time at Deepmind (UK), Meta (US), MILA (CA), INRIA (FR) and HILTI (LI).
He received the ETH medal for outstanding doctoral theses and the Schmidt Sciences AI2050 Early Career Fellowship.
In his research, Antonio strives to improve the efficiency of deep learning technologies by pioneering new architectures and training techniques grounded in theoretical knowledge. His work encompasses two main areas: understanding the intricacies of large-scale optimization dynamics and designing innovative architectures and powerful optimizers capable of handling complex data. Central to his studies is exploring innovative techniques for decoding patterns in sequential data, with implications in biology, neuroscience, natural language processing, and music generation.
For more information, please refer to Antonio's personal website and his Google Scholar
Selected Publications (full list here):
- In Search of Adam’s Secret Sauce (NeurIPS 2025 Oral)
- Theoretical Foundations of Deep Selective State-Space Models (NeurIPS 2024)
- Resurrecting Recurrent Neural Networks for Long Sequences (ICML 2023 Oral)
- Anticorrelated Noise Injection for Improved Generalization (ICML 2022)
- Signal Propagation in Transformers: Theoretical Perspectives (NeurIPS 2022)
- Continuous-time Models for Stochastic Optimization Algorithms (NeurIPS 2019)