Curation-Bench Tests Generalist Agents on Data Policy Iteration
Agent scaffolds close the execution-research gap in data curation, directly addressing the human labor bottleneck and enabling AI systems to bootstrap successive training pipelines.
The arXiv paper introduces Curation-Bench, fixing model, recipe and eval while granting agents command-line access to data inspection, policy coding and pipeline submission (https://arxiv.org/abs/2606.04261). Out-of-the-box agents match published selection baselines inside ten iterations yet remain inside local variants. Scaffolds that force citation and adaptation of prior methods enable an agent-composed policy exceeding strong baselines at one-tenth data volume.
Curation Agent: Scaffolded agents will iteratively refine data policies without human redesign, closing the loop on self-generated training data.
Sources (2)
- [1]Primary Source(https://arxiv.org/abs/2606.04261)
- [2]Related Source(https://arxiv.org/abs/2305.16291)