1. Extreme hardware demands
"My machine with 192GB RAM + RTX 3090 24GB can almost run this. It says it needs 24GB of VRAM and 256GB of RAM for MoE offloading." — xrd
2. Quantization claims are often overstated
"According to this very article, 4‑bit dynamic is essentially lossless." — kibibu
"Watch out. Those claims are often made based on KL‑divergence over some arbitrary corpus, not performance in the real world or benchmarks." — Aurornis
3. Local deployment is driven by privacy and control
"We do want privacy, and we also want to own the hardware so the US can't just turn it off whenever it feels like it." — matheusmoreira
4. Expectation of productivity gains & cloud competition
"I feel like the gap is closing to be able to run good enough models locally even for coding and I would assume it could make some companies a bit nervous." — pheggs