Why a 94% Citation Hallucination in Grok-3 Forced a Rethink of Factuality Benchmarks
https://www.tumblr.com/spectralandroidmercenary/810235633437147136/why-ctos-can-no-longer-treat-llm-hallucinations-as
Grok-3 hit 94% citation hallucination while the FACTS benchmark reported a 68.8 score — hard numbers that changed production risk estimates The data suggests the situation was worse than the vendor materials implied