The discussion about the new Mistral models reveals three primary themes:
1. The Purpose and Relevance of Comparing to SOTA Closed Models
A major point of contention is whether the new models (like Mistral) should be compared to the current proprietary State-of-the-Art (SOTA) models from giants like OpenAI and Google, or if they are targeting a different market segment entirely.
- Theme Summary: Many users believe comparisons to closed SOTA models are unfair or irrelevant because Mistral targets users with specific constraints (e.g., privacy, self-hosting, cost), making the SOTA models inaccessible or unsuitable. Others strongly argue the lack of comparison implies unfavorable results.
- Supporting Quotation (Targeting Different Users): "Why should they compare apples to oranges? Ministral3 Large costs ~1/10th of Sonnet 4.5. They clearly target different users." said by "Lapel2742".
- Supporting Quotation (Implication of Unfavorability): "The lack of the comparison (which absolutely was done), tells you exactly what you need to know." said by "constantcrying".
2. The Value Proposition of Open-Weight Models (Privacy vs. Performance)
The discussion frequently circles back to why users choose open-weight options like Mistral over commercially available proprietary models, centering on data sovereignty and business requirements.
- Theme Summary: For many European or regulatory-sensitive businesses, the perceived risk of using US-based proprietary providers (due to concerns like the CLOUD Act or data exfiltration) outweighs the performance gap. Open models provide a necessary "structural check" and privacy control.
- Supporting Quotation (Privacy/Geopolitical Concern): "I think people from the US often aren't aware how many companies from the EU simply won't risk losing their data to the providers you have in mind, OpenAI, Anthropic and Google... Mistral is positioning themselves for that market..." said by "bildung".
- Supporting Quotation (Structural Check): "Open weight LLMs aren't supposed to 'beat' closed models, and they never will. That isnβt their purpose. Their value is as a structural check on the power of proprietary systems; they guarantee a competitive floor." said by "mvkel".
3. Practical Performance vs. Benchmark Scores
Users expressed skepticism about official benchmarks, with anecdotes suggesting real-world utility often diverges from leaderboard rankings, particularly concerning proprietary models like Gemini.
- Theme Summary: Several participants noted dissatisfaction with top-ranking proprietary models in practical application (e.g., high rates of "gibberish" or poor instruction following), leading them to favor less benchmark-heavy but more reliable or cost-effective alternatives like Mistral for specific tasks.
- Supporting Quotation (Practical Use Outperforms Benchmarks): "Even if it does not hold up in benchmarks, it still outperformed in practice." said by "barrell" regarding a previous Mistral model succeeding where GPT-5 struggled with complex formatting.
- Supporting Quotation (Skepticism of Benchmarks): "Benchmarks are never to be believed, and that has been the case since day 1." said by "nullbio".