Designing Experiments to Uncover Missing Physics: A Systematic Path to New Fundamental Laws
Preprint presents a sequential experimental design method that optimally discriminates between symbolic model candidates to discover missing physics, demonstrated on a simulated bioreactor. Builds on UDE and SINDy but adds the crucial 'which experiment next' layer. Not yet peer-reviewed; computational study with noted data-quality limitations.
This arXiv preprint (2604.01231v1) proposes a sequential experimental design framework that uses machine learning to discover 'missing physics' in incomplete scientific models. Unlike standard approaches that collect data somewhat blindly, the method actively chooses the next experiment to best distinguish between competing mathematical explanations generated by symbolic regression.
The core technique combines universal differential equations — where neural networks stand in for unknown terms in differential equations — with symbolic regression to convert those black-box networks into human-readable equations. The new contribution is an optimal discrimination algorithm that treats candidate models like competing hypotheses and designs experiments to drive the biggest wedge between their predictions. The authors demonstrate it on a bioreactor model, a common system in bioprocessing where reaction kinetics are often only partially known.
As a preprint, this work has not yet been peer-reviewed. The methodology is primarily computational; the abstract provides no human or experimental sample size, indicating the bioreactor application was likely simulated rather than run on physical lab equipment. Limitations include heavy dependence on data quality and the assumption that symbolic regression will produce a manageable set of plausible structures. Noisy real-world measurements or overly complex systems could reduce effectiveness.
This paper goes further than most coverage, which tends to celebrate the ML tools while ignoring the experimental design bottleneck. Previous landmark works such as Brunton et al.'s 2016 PNAS paper on Sparse Identification of Nonlinear Dynamics (SINDy) showed how to extract equations from data but offered little guidance on collecting the most informative data. Similarly, Rackauckas and colleagues' Universal Differential Equations work demonstrated how to embed neural nets inside physics models, yet stopped short of telling researchers which experiments to run next.
The deeper implication, largely missed in early discussion, is that this creates a genuine scientific method for probing beyond current models. Many fields rely on phenomenological closures — effective equations that approximate missing micro-scale physics. Climate models parameterize clouds, fluid models approximate turbulence, and biological models simplify metabolic networks. The framework presented here offers a systematic way to design experiments that target these gaps, potentially revealing genuinely new terms that belong in the fundamental equations rather than just better fits.
By treating model discovery as an active learning problem between symbolic candidates, the approach connects data-driven science with classical experimental philosophy in the spirit of Popperian falsification. It suggests a future where autonomous laboratories could iterate between proposing candidate physical laws and designing the precise test needed to accept or reject them — a powerful step toward discovering laws that current theory has not even imagined.
HELIX: This experimental design framework could shift scientific discovery from passive data collection to active interrogation of nature, systematically revealing new terms in equations that govern bioreactors, climate systems, and beyond.
Sources (3)
- [1]Experimental Design for Missing Physics(https://arxiv.org/abs/2604.01231)
- [2]Discovering governing equations from data by sparse identification of nonlinear dynamical systems(https://www.pnas.org/doi/10.1073/pnas.1517384113)
- [3]Universal Differential Equations for Scientific Machine Learning(https://arxiv.org/abs/2001.04326)