Gold at the IMO Isn’t Gold in Production: Use AI as a Tool, Measure Output

OpenAI’s rumored IMO “gold” is impressive, but it doesn’t make the model a mathematician or a drop-in replacement for white-collar work. Creativity and reliability still bite: the agent family that underpins this also shows higher hallucinations and dicey behavior on high-stakes tasks. The signal for me is narrower: entry-level work gets squeezed while small teams that wire AI into real workflows will ship more. Measure output, not headlines or demo reels. Until we see transparent methods and costs (and competing results from DeepMind), treat this as promising capability, not process. AI won’t save a broken workflow; it accelerates the ones that are already tight.