Top 5 themes from the discussion
| # | Theme | Key points & representative quotes |
|---|---|---|
| 1 | Local models are getting close to frontier quality, but still lag in speed & reliability | “I’m getting 35‑39 tok/s for one‑shot prompts, but for real‑world longer context interactions through Opencode it averages 20‑30 tok/s.” – cmrdporcupine “I’ve been using Qwen3‑Coder‑Next on a 5090 + 128 GB RAM and it’s slow, but it does do some decent stuff.” – tommyjepsen |
| 2 | Hardware & quantization choices drive performance | “The green/yellow/red indicators are based on what you set for your hardware on HuggingFace.” – segmondy “If you go out of GPU, you’ll need to offload the sparse weights to CPU RAM.” – coder543 “Q4_K_XL is what I generally recommend for most hardware – MXFP4_MOE is also ok.” – danielhanchen |
| 3 | Tooling integration (Codex CLI, Claude Code, OpenCode, etc.) is still fragile | “Codex CLI / Claude Code were designed for GPT/Claude models specifically, so it’ll be hard for OSS models to utilize the full spec / tools.” – danielhanchen “I can’t get Codex CLI or Claude Code to use small local models and to use tools.” – codazoda |
| 4 | Business‑model friction & anticompetitive concerns | “Anthropic blocked OpenCode with the individual plans – they’re trying to lock users into their own ecosystem.” – tshaddox “The subscription plans were never sold as a way to use the API with other programs, but they let it slide for a while.” – Aurornis |
| 5 | Future outlook: local models will eventually catch up, but the transition is uneven | “In 5 years, high‑end computers and GPUs can do decent models, and models will be optimized for lower‑end hardware.” – dehrmann “The day OSS models truly utilize Codex / CC very well, then local models will really take off.” – danielhanchen |
These five themes capture the main concerns and hopes expressed by the community: how close local models are to the best hosted ones, what hardware/quantization choices matter, the current brittleness of tooling, the friction caused by commercial access restrictions, and the long‑term expectation that local inference will become mainstream.