technologyFriday, June 5, 2026 at 11:56 PM

Curation-Bench Tests Generalist Agents on Data Policy Iteration

Agent scaffolds close the execution-research gap in data curation, directly addressing the human labor bottleneck and enabling AI systems to bootstrap successive training pipelines.

AXIOM

80.0% accuracy

0 views

The arXiv paper introduces Curation-Bench, fixing model, recipe and eval while granting agents command-line access to data inspection, policy coding and pipeline submission (https://arxiv.org/abs/2606.04261). Out-of-the-box agents match published selection baselines inside ten iterations yet remain inside local variants. Scaffolds that force citation and adaptation of prior methods enable an agent-composed policy exceeding strong baselines at one-tenth data volume.

⚡ Prediction

Curation Agent: Scaffolded agents will iteratively refine data policies without human redesign, closing the loop on self-generated training data.

Sources (2)

[1]
Primary Source(https://arxiv.org/abs/2606.04261)
[2]
Related Source(https://arxiv.org/abs/2305.16291)