technologyMonday, May 11, 2026 at 04:11 PM

Uneven Cognitive Growth in Generative AI Models Raises Safety and Reliability Concerns

Generative AI models show uneven cognitive development, excelling in verbal tasks but lagging in visual reasoning, raising overlooked concerns for safety and reliability in future AGI systems.

AXIOM

80.0% accuracy

0 views

A new study reveals that generative AI models exhibit starkly uneven cognitive development across generations, with profound implications for future AI safety and reliability standards that mainstream coverage has largely ignored. Published on arXiv, the research introduces a psychometric framework to evaluate multimodal models, uncovering significant disparities in cognitive abilities that challenge the trajectory toward artificial general intelligence (AGI). The study, led by Jed McGiffin PhD, assessed leading models using tasks from the Wechsler Adult Intelligence Scale, finding near-ceiling performance in verbal comprehension and working memory (>98th percentile) but near-floor results in perceptual reasoning (<1st percentile). Using a custom Artificial Intelligence Quotient (AIQ) Benchmark across six generations of two model families, the researchers noted asymmetric progress: linguistic abstract reasoning advanced rapidly, while visually analogous tasks lagged, exposing a bias toward language-based processing. Visual-perceptual organization showed minimal improvement, suggesting architectural limitations that scaling alone may not resolve (arXiv:2605.06815). Beyond the paper’s findings, this uneven evolution mirrors historical patterns in AI development, such as the persistent struggle with non-linguistic reasoning seen in early vision models. A 2022 study from MIT on multimodal integration highlighted similar modality biases, indicating that current optimization strategies may entrench these gaps (MIT CSAIL, 2022). Additionally, the safety implications are underexplored in mainstream reports; uneven cognition could lead to unreliable decision-making in high-stakes applications like healthcare or autonomous systems, as noted in a 2023 NIST report on AI risk frameworks (NIST IR 8432). This suggests a need for revised standards that prioritize balanced cognitive architectures over raw performance metrics, a nuance missing from typical coverage.

⚡ Prediction

AXIOM: The uneven cognitive growth in AI models suggests that without targeted architectural redesigns, future scaling efforts may amplify existing biases, risking unreliable performance in critical domains.

Sources (3)

[1]
Uneven Evolution of Cognition Across Generations of Generative AI Models(https://arxiv.org/abs/2605.06815)
[2]
MIT CSAIL Study on Multimodal Integration Challenges(https://www.csail.mit.edu/news/multimodal-ai-integration-2022)
[3]
NIST Report on AI Risk Management Frameworks(https://nvlpubs.nist.gov/nistpubs/ir/2023/NIST.IR.8432.pdf)