1. Cost-Optimization and Architecture Strategy There was a strong emphasis on reducing expenses through strategic architecture and model selection. Users suggested batching, using cheaper or local models, and optimizing API calls.
"Most of the cost savings came from not sending stuff to the LLM that didn't need to go there, plus the batch API is half the price of real-time calls." — ok_orco "Today's local models are quite good. I started off with cpu and even that was fine for my pipelines." — LTL_FTC
2. Alternative Model Providers for Cost and Quality Users recommended various alternative LLM providers (other than the major two) to lower costs or improve reliability, specifically mentioning z.ai, minimax, and cheaper Chinese models.
"Consider using z.ai as model provider to further lower your costs." — gandalfar "Or minimax - m2.1 release didn't make a big splash in the news, but it's really capable." — viraptor "You also can try to use cheaper models like GLM, Deepseek, Qwen,at least partially." — DeathArrow
3. Implementation and Optimization Best Practices The discussion highlighted technical tweaks to improve the system, such as using prompt caching, specialized topic modeling libraries, and implementing sanity checks.
"Are you also adding the proper prompt cache control attributes? I think Anthropic API still doesn't do it automatically" — dezgeg "Have you looked into BERTopic?" — joshribakoff