Imagine an AI system that continuously invents new ideas, develops new skills, and explores new solutions without being given a specific goal. This vision of open-ended AI could transform scientific discovery. In their new paper “Safety Must Precede the Deployment of Open-Ended AI Agents “, PI Sahar Abdelnabi and collaborators argue that while open-ended AI could unlock major scientific and technological advances, it also introduces unique safety risks that remain largely unexplored.
Unlike traditional AI systems, open-ended AI continuously explores new possibilities and builds on its own outputs over time. Recent advances in large language models have made this approach increasingly powerful, helping systems generate novel behaviors, learn new skills, and tackle problems without explicit instructions.
This open-ended nature creates fundamental safety challenges. As systems become more creative and autonomous, their future behavior becomes harder to predict, evaluate, and control. Small errors or misaligned behaviors could compound over time, while efforts to constrain the system too tightly may limit the very creativity that makes open-ended AI valuable.
To address these challenges, the paper calls for new safety approaches that can evolve alongside the systems themselves. Existing AI safety techniques may not be sufficient for open-ended systems because these methods often assume relatively stable objectives and predictable behaviors.
Instead, the authors propose several research directions, including:
- Developing oversight mechanisms that adapt continuously as systems evolve.
- Creating safety constraints that can remain effective even when the system encounters novel situations.
- Designing evaluation frameworks capable of assessing risks that emerge over long periods of autonomous exploration.
- Building methods to monitor whether open-ended systems remain aligned with human values as they change and grow.
The authors' core message is that safety cannot be treated as an afterthought. If open-ended AI is to deliver on its promise of accelerating discovery and innovation, safety research must advance alongside capabilities to ensure these systems remain aligned with human values as they evolve.
The work by Ivaxi Sheth (CISPA), Jan Wehner (CISPA), Sahar Abdelnabi (ELLIS Institute Tübingen), Ruta Binkyte (CISPA), and Mario Fritz (CISPA) has been accepted to ICML 2026, the Forty-Third International Conference on Machine Learning, and will be presented at Poster Session 2, July 7th at 2pm.
Read the full paper here.
Find out more about Sahar’s research.