Classical Simulations Help AI Models Learn Chemistry With Far Less Expensive Data
Transfer learning from cheap classical simulations to quantum data makes GNN interatomic potentials far more data-efficient.
A new preprint on arXiv introduces Transfer-PaiNN (T-PaiNN), a transfer learning framework that pretrains graph neural network models on large datasets from classical force fields before fine-tuning them on much smaller sets of density functional theory (DFT) data. Tested on the QM9 molecular dataset and condensed-phase liquid water simulations, the method achieved up to 25-fold reductions in error in low-data regimes and improved predictions of energies, forces, density, and diffusion compared to models trained only on quantum data; exact sample sizes for the classical pretraining data are not detailed in the abstract. As this is a preprint (https://arxiv.org/abs/2603.24752) and not yet peer-reviewed, limitations include the need for further validation on more complex chemical systems.
HELIX: This means researchers can build accurate AI chemistry tools using much less supercomputer time, which could speed up the discovery of new medicines and materials for everyday life.
Sources (1)
- [1]Autotuning T-PaiNN: Enabling Data-Efficient GNN Interatomic Potential Development via Classical-to-Quantum Transfer Learning(https://arxiv.org/abs/2603.24752)