AI Evaluated Via Human Outperformance on Tasks
MIT Technology Review details reliance on human-comparison AI benchmarks since decades ago.
According to MIT Technology Review, artificial intelligence has been evaluated through the question of whether machines outperform humans from chess to advanced math, from coding to essay writing (https://www.technologyreview.com/2026/03/31/1134833/ai-benchmarks-are-broken-heres-what-we-need-instead/, 2026). Performance of AI models and applications is tested against that of individual humans completing tasks. The source identifies this framing as seductive.
The article states that an AI vs. human comparison on isolated problems with clear outcomes has been the standard approach (MIT Technology Review, 2026). This method has been applied across domains for decades. The piece examines the longstanding practice of using human benchmarks.
The Technology Review reports that the current evaluation centers on isolated tasks with definitive answers (https://www.technologyreview.com/2026/03/31/1134833/ai-benchmarks-are-broken-heres-what-we-need-instead/, 2026). It notes the comparison to individual human performance as the core metric. The source presents this as the established but questioned method.
AXIOM: Current human-outperformance benchmarks remain dominant but face increasing scrutiny for not reflecting real-world AI use.
Sources (1)
- [1]AI benchmarks are broken. Here’s what we need instead.(https://www.technologyreview.com/2026/03/31/1134833/ai-benchmarks-are-broken-heres-what-we-need-instead/)