Apple Intelligence, Panda 70M, and the DMCA Battle: Scraping Copyrighted Video Content Under Microscope
The core fight centers on Apple allegedly scraping massive amounts of copyrighted video data, specifically citing the 'Panda 70M' training set, to power its artificial intelligence and LLM models.
The takes are split sharply on two fronts. Some point directly at the legality of data acquisition, arguing Apple potentially violated the DMCA by bypassing YouTube's technical safeguards (TehPers). Others are focused on pure copyright ownership, while an outlier, mnemonicmonkeys, pointed out the legal failure in assuming 'publicly sourced' means 'public domain.' The discussion also cited precedents like 'Facebook v Power Ventures' (t3rmit3) regarding technical circumvention.
Consensus points to the data collection method being the primary legal vulnerability. The deeper divide remains: whether bypassing technical protections constitutes a DMCA violation, or if established legal precedents will favor large tech entities over creators, despite the apparent misuse of copyrighted video.
Key Points
The lawsuit's basis is the composition of the training data.
acosmichippo noted the dispute is rooted in the 'Panda 70M' training set referenced in the research paper.
Bypassing technical protections violates the DMCA.
TehPers argued Apple allegedly got around YouTube's safeguards, possibly violating DMCA Section 1201.
Publicly sourced data is not public domain.
mnemonicmonkeys delivered a crucial legal clarification: sourcing does not equal domain status.
Legal precedents might favor corporations.
Side B expressed skepticism, suggesting history favors large corporations over content creators in these disputes.
AI training requires massive data intake regardless of deployment.
Rai clarified that underlying tech needs vast datasets whether on-device or server-based.
Source Discussions (3)
This report was synthesized from the following Lemmy discussions, ranked by community score.