The discussion revolves heavily around the intersection of copyright law, software licensing (particularly the GPL), and the training of Large Language Models (LLMs).
Here are the three most prevalent themes:
1. Fair Use vs. Copyright Infringement in LLM Training
A significant portion of the debate centers on whether training LLMs on publicly available copyrighted material constitutes "fair use" (especially in the US) or remains copyright infringement, which would render underlying license terms irrelevant if training itself is legal.
- Supporting Quotation: One user posits a key belief driving the legality argument: "To my understanding, if the material is publicly available or obtained legally (i.e., not pirated), then training a model with it falls under fair use." ("maxloh")
- Supporting Quotation: Another user questions the fairness of this interpretation, suggesting corporations benefit while creators suffer: "The training of the big LLMs has been criminal. Whether we talk about GPL licensed code or the millions of artist that never released their work under a specific license and would never haven consented to it being used for training." ("cardanome")
2. The Legal Status and Enforceability of the GPL in Relation to AI Outputs
Users extensively debated whether the "viral" nature of the GPL (copyleft) could force the entire resulting LLM modelโor its outputโto become GPL-licensed, contrasting this with the argument that licenses only apply to tangible software distribution, not AI model weights derived from data.
- Supporting Quotation: A user proposes a customized restriction for future licenses: "My next project will be released under a GPL-like license with exactly this condition added. If you train a model on this code, the model must be open source & open weights" ("Orygin")
- Supporting Quotation: A counterpoint suggests this "virality" overreach is legally unfounded, stating: "GPL can't do much more than that. A license over a piece of code cannot automatically change the copyright status of another piece of code. There simply isn't legal framework for that." ("raincole")
3. Corporate Behavior and the Perception of Legal Accountability
There is significant cynicism regarding whether large corporations comply with licensing terms, operate under the assumption that existing law favors them, or simply view legal costs as negligible overhead.
- Supporting Quotation: Regarding the current legal atmosphere, one user states: "And the current norm that the trillion dollar companies have lobbied for is that you can train on copyrighted material all you want so that's the reality we are living in. Everything ever published is all theirs." ("xgulfie")
- Supporting Quotation: Another user suggests the calculus is purely financial: "It's just a side cost of doing business, because asking for forgiveness is cheaper and faster than asking for permission." ("rvnx")