Why a 94% Citation Hallucination in Grok-3 Forced a Rethink of Factuality Benchmarks

https://www.tumblr.com/spectralandroidmercenary/810235633437147136/why-ctos-can-no-longer-treat-llm-hallucinations-as

Grok-3 hit 94% citation hallucination while the FACTS benchmark reported a 68.8 score — hard numbers that changed production risk estimates The data suggests the situation was worse than the vendor materials implied

Submitted on 2026-03-05 21:30:32