LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws

Prasanna Mayilvahanan, Thaddäus Wiedemer, Sayak Mallick, matthias-bethge, Wieland Brendel

February, 2025

Abstract

Scaling laws guide the development of large language models (LLMs) by offering estimates for the optimal balance of model size, tokens, and compute. More recently, loss-to-loss scaling laws that relate losses across pretraining datasets and downstream tasks have emerged as a powerful tool for understanding and improving LLM performance. In this work, we investigate which factors most strongly influence loss-to-loss scaling. Our experiments reveal that the pretraining data and tokenizer determine the scaling trend. In contrast, model size, optimization hyperparameters, and even significant architectural differences, such as between transformer-based models like Llama and state-space models like Mamba, have limited impact. Consequently, practitioners should carefully curate suitable pretraining datasets for optimal downstream performance, while architectures and other settings can be freely optimized for training efficiency.

Wieland Brendel

Principal Investigator (PI)

Wieland Brendel received his Diploma in physics from the University of Regensburg (2010) and his Ph.D. in computational neuroscience from the École normale supérieure in Paris (2014). He joined the University of Tübingen as a postdoctoral researcher in the group of Matthias Bethge, became a Principal Investigator and Team Lead in the Tübingen AI Center (2018) and an Emmy Noether Group Leader for Robust Machine Learning (2020). In May 2022, Wieland joined the Max-Planck Institute for Intelligent Systems as an independent Group Leader and is now a Hector-endowed Fellow at the ELLIS Institute Tübingen (since September 2023). He received the 2023 German Pattern Recognition Award for his substantial contributions on robust, generalisable and interpretable machine vision. Aside of his research, Wieland co-founded a nationwide school competition (bw-ki.de) and a machine learning startup focused on visual quality control.

LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws

Abstract

Prasanna Mayilvahanan

PhD candidate

Wieland Brendel

Principal Investigator (PI)