OpenEuroLLM

Developing a fully open-source family of Large Language Models for European languages

Pre-Training Post-Training Members

OpenEuroLLM is a landmark €37.4 million European initiative developing multilingual Large Language Models with fully open data, code, and weights. The project unites 20 leading research institutions, companies, and EuroHPC centers to build state-of-the-art models comparable to prominent open-source communities like LAION, Open-Sci, and OpenML. It was recently awarded the prestigious Strategic Technologies for Europe Platform (STEP) Seal as the first Digital Europe Programme-funded project to receive this recognition.

As part of OpenEuroLLM, our team contributes two key parts of the LLM pipeline: Pre-Training and Post-Training.

Pre-Training

Pre-Training: We develop new ways to explore and improve how large language models are trained. One of our key interests is finding simple rules, called scaling laws, that help predict how results from smaller models will carry over to larger ones. To do this, we draw on ideas from Automated Machine Learning (AutoML), which aims to make the process of building and improving machine learning systems more automatic and efficient.

Post-Training

Post-Training: We lead the evaluation workpackage, a core technical component of the project. Our team develops efficient evaluation infrastructure across computing clusters and creates transparent methods for assessing multilingual models—both pre-trained and instruction-tuned. We also work on improving instruction-tuning pipelines, advancing model capabilities and performance.