Iterative LLM Framework Discovers Two Novel Dynamical Dark-Energy Equations with Bayesian Evidence Gains >1 on Pantheon+ plus DESI DR2 plus Planck 2018
A physics-constrained iterative LLM loop proposes, embeds, optimizes and criticizes dynamical dark-energy equations of state, recovering two new forms that exceed standard parameterizations by more than one unit in Bayesian evidence on Pantheon+ plus DESI DR2 plus Planck 2018. The approach illustrates an emerging shift from AI-assisted fitting to AI-assisted model generation in fundamental cosmology.
The framework recasts phenomenological model building as a closed loop: a proposer LLM generates candidate w(a) functions and cosmological justifications grounded in retrieved papers; each candidate is inserted into CLASS or CAMB, optimized against Pantheon+ , DESI DR2 BAO and full Planck 2018 TT/EE/TE/lensing likelihoods, then scored on likelihood, Bayesian evidence and theoretical consistency. An independent critic LLM evaluates novelty, clarity, stability and implementation validity, feeding structured feedback to the next proposal round. After several iterations the system surfaced two functional forms absent from the prior literature that deliver Delta lnZ greater than one relative to Lambda-CDM and the Chevallier-Polarski-Linder parameterization.
This result sits within an emerging pattern in which large language models move beyond parameter fitting to generate and iteratively refine interpretable functional forms under explicit physics constraints. Earlier AI cosmology efforts focused on symbolic regression of expansion histories or neural emulators; the present work adds an explicit literature-retrieval-plus-critic loop that couples mathematical structure to physical reasoning, reducing the risk of purely data-driven but unmotivated expressions.
The principal limitation remains the modest evidence gain and the still-limited set of data combinations explored; full-shape galaxy clustering, weak-lensing surveys and forthcoming CMB-S4 or Euclid data will provide stronger tests. Independent groups re-implementing the critic loop on blinded simulations are required before the method can be treated as a standard discovery tool rather than a promising prototype.
Next steps include releasing the full proposal-critic prompt templates and applying the pipeline to early DESI Year-3 and Roman supernova samples to determine whether the same functional forms retain their ranking.
Bom et al.: Re-analysis of DESI Year-3 BAO and full-shape data within 18 months will show whether the leading AI-derived equation maintains Delta lnZ > 1.5 against CPL when the sound-horizon calibration is varied by 1 percent.
Sources (2)
- [1]Primary Source(https://arxiv.org/abs/2606.19427)
- [2]Supporting Source(https://arxiv.org/abs/2212.07409)