AI Developers Slam Copyright Claims: Public Domain Data Alone is Enough to Train Models

Post date: June 6, 2025 · Discovered: April 23, 2026 · 3 posts, 0 comments

AI research proponents are asserting that massive language models require no copyrighted material for training. They claim development is fully achievable using only datasets sourced from the public domain or under open licenses.

The core argument circulating is that the industry narrative suggesting copyrighted IP is essential for AI development is false, labeled by sources as "total BS." Researchers focus on the initial hurdle: data curation. They state that once this dataset compilation is completed, the process can theoretically be replicated by others without constant legal roadblocks.

The consensus emerging from the analyzed posts dismisses the necessity of copyrighted works. The fault line appears to be between the established industry narrative and the research-backed assertion that open-source data sources are sufficient to build powerful AI.

Key Points

#1AI training can rely exclusively on non-copyrighted material.

Researchers assert it is entirely possible to train sophisticated AI models using only data from public domains and openly licensed sources.

#2Copyright necessity claim is dismissed as false.

The argument that AI companies must violate copyright or steal IP to operate is strongly labeled as 'total BS' by the source posters.

#3Data curation is the primary technical obstacle.

The main difficulty identified is the initial, intensive work of curating the requisite, legally sound training datasets.

#4The process, once established, is replicable.

Advocates maintain that the data curation groundwork, once done, can be replicated by others, minimizing continuous legal dependence.

Source Discussions (3)

This report was synthesized from the following Lemmy discussions, ranked by community score.

61
points
It turns out you can train AI models without copyrighted material
[email protected]·7 comments·6/6/2025·by yogthos·engadget.com
52
points
It turns out you can train AI models without copyrighted material
[email protected]·3 comments·6/6/2025·by cm0002·engadget.com
24
points
It turns out you can train AI models without copyrighted material
[email protected]·5 comments·6/6/2025·by yogthos·engadget.com