1. Running 30‑35 B Models Locally Requires Ample RAM/VRAM
- NBJack: “I assumed the author was talking about an Nvidia Tesla M4 … (hence my confusion … they meant the M40 series, which has 24 GB of VRAM).” — NBJack
- canpan: “Recent models (Qwen 3.6 and Gemma) can really do coding locally. … 24 GB is just a bit short of that.” — canpan
- tra3: “There’s definitely an option with 24 gigs of RAM: https://support.apple.com/en‑ca/121552” — tra3
- jval43: “Realistically it’s 48 M5 Pro vs 128 M5 Max due to constraints on how you can configure them. So a more substantial difference of ~2 k USD.” — jval43
2. Local Models Are Still Far Behind Frontier Models on Complex Tasks
- solenoid0937: “> It is absolutely not comparable to frontier models.” — solenoid0937
- HDBaseT: “Local models are very far away from models like Opus 4.7 or ChatGPT 5.5 in coding and problem‑solving areas.” — HDBaseT - thot_experiment: “I’ve literally had it get a thing right that Opus 4.7 missed… The difference between the sets of things I trust the two models to do is surprisingly small.” — thot_experiment
3. Economic & Practical Incentives Shape the Local‑Model Discussion
- nu11ptr: “A 128 GiB MacBook Pro in Canada is north of CAD $11k … At $20/month for a cloud AI subscription you’re looking at almost 30 years of service for the same money.” — nu11ptr
- reillyse: “If I’m spending $800/month on tokens I can build a pretty beefy local machine for the cost of a few months spend.” — reillyse - NBJack: “Good enough … If I was using it … would just use Codex … I think I would just use Codex at this point.” — NBJack
These three themes capture the dominant conversations: hardware limits for running large LLMs, the gap between local and frontier model performance, and the cost‑benefit calculus that drives users’ choices.