1. Self-Hosting Local LLMs: Hardware and Software Recommendations
Users share practical setups for affordable local inference, emphasizing llama.cpp, Ollama, and GPU configs over cloud.
"Cheap tier is dual 3060 12G. Runs 24B Q6 and 32B Q4 at 16 tok/sec." β suprjami
"LM Studio can run both MLX and GGUF models but does so from an Ollama style... macOS GUI." β simonw
"but people should use llama.cpp instead." β thehamkercat
2. Cloud Subscription Limits and Upgrades for Coding
$20/mo plans (Claude, Codex, Gemini) hit limits quickly during intensive use, prompting $100β$200/mo upgrades for hobbyists.
"On a $20/mo plan doing any sort of agentic coding you'll hit the 5hr window limits in less than 20 minutes." β smcleod
"I pay $100 so I can get my personal (open source) projects done faster." β cmrdporcupine
"$20 Claude doesn't go very far." β cmrdporcupine
3. Local Models Lag Behind Cloud SOTA for Complex Tasks
Local setups suit privacy/light use but underperform cloud models (Claude > Codex) for production coding; "vibe coding" criticized.
"we're still 1-2 years away from local models not wasting developer time outside of CRUD web apps." β cloudhead
"Vibe coding is a descriptive... label... code quality be damned." β satvikpendem
"Local models are purely for fun, hobby, and extreme privacy paranoia." β Workaccount2