1. Claude/Opus is seeming to get worse
Many users report a noticeable drop in accuracy or “dumb” responses, especially after a new release or during peak hours.
“I’ve noticed a degradation in Opus 4.5… it feels regressed by a generation.” – epolanski
“I’ve noticed a degradation… the model just gives up.” – jampa
2. The cause is unclear – load, bugs, or user skill?
Participants argue whether the drop is due to server load/quantization, software bugs, or simply users learning to prompt better.
“It could be a software bug affecting inference.” – gpm
“I think the degradation is because of subtle changes to CC prompts/tools.” – turnsout
“I’m learning more about what the model is and is not useful for, my subjective experience improves, not degrades.” – emp17344
3. Benchmark design and statistical rigor are hotly debated
The community questions the validity of the 4 % drop claim, the confidence‑interval approach, and the sample size.
“The daily scale is not statistically significant and is meaningless.” – goldenarm
“They’re reporting statistically significant differences but the methodology is flawed.” – crazygringo
“You need to run the test 5–10 times per day to get a reliable signal.” – ofirpress
4. Transparency and trust in Anthropic’s statements
Users express skepticism about Anthropic’s assurances that they never downgrade models, and demand clearer disclosure of “thinking power” and potential shadow‑downgrades.
“We never reduce model quality due to demand, time of day, or server load.” – anthropic (quoted by many)
“They’re probably resource‑constrained and are silently serving cheaper models.” – arcanemachiner
“Transparency is a big gripe… I would prefer a straight‑no than a silently downgraded answer.” – dmos62
These four themes capture the core of the discussion: a perceived decline in performance, uncertainty over its origin, contention over how to measure it, and a call for greater openness from the provider.