AI Carb-Counting Experiment Reveals Dangerous Inconsistencies in Health Applications

An experiment querying AI models 27,000 times for carb counts in food photos revealed dangerous inconsistencies, with variations up to 429g per image, posing risks for diabetes management. Misidentifications and lack of testing transparency compound reliability concerns in health AI.

{"lede":"A recent experiment exposing AI's inability to consistently estimate carbohydrate content in food photos raises urgent concerns about reliability in health-critical applications like diabetes management.","paragraph1":"The study, published as a preprint by Diabettech, tested four leading AI models—OpenAI GPT-5.4, Anthropic Claude Sonnet 4.6, Google Gemini 2.5 Pro, and Gemini 3.1 Pro Preview—by submitting 13 food photographs over 26,904 queries. Results showed significant variability in carb estimates, with Gemini 2.5 Pro displaying a staggering range of 55g to 484g for a single paella image, equivalent to a potential 42.9-unit insulin miscalculation. Even at the lowest randomness settings, no model provided consistent answers, highlighting a fundamental flaw in AI reliability for precise health metrics (Source: Diabettech, 2023).","paragraph2":"Beyond the raw inconsistency, the experiment uncovers a deeper issue: AI 'hallucinations' and misidentifications that mainstream coverage often overlooks. For instance, Claude consistently misidentified a Bakewell tart as a 'Linzer torte' in 100% of queries, while three models labeled crema catalana as 'creme brulee' with near-unanimous error. This mirrors broader patterns of AI unreliability seen in medical imaging misdiagnoses, as reported in a 2022 Nature study on AI diagnostic tools failing under repeated testing, suggesting that health-focused AI lacks the robust validation needed for real-world deployment (Source: Nature, 2022).","paragraph3":"The Diabettech findings also connect to underreported gaps in AI testing protocols, where single-query outputs mask underlying variability—a risk not addressed in typical tech optimism. A 2021 FDA report on AI in medical devices warned of insufficient stress-testing for edge cases, a concern amplified here as users receive one seemingly authoritative number with no visibility into potential outliers. Without standardized, transparent testing—beyond what even this study covers—AI's integration into diabetes apps and similar tools remains a gamble, potentially endangering lives with each inconsistent result (Source: FDA, 2021)."}

THE FACTUM

AI Carb-Counting Experiment Reveals Dangerous Inconsistencies in Health Applications

Sources (3)