- cross-posted to:
- technology@lemmy.world
- cross-posted to:
- technology@lemmy.world
Copyright class actions could financially ruin AI industry, trade groups say.
AI industry groups are urging an appeals court to block what they say is the largest copyright class action ever certified. They’ve warned that a single lawsuit raised by three authors over Anthropic’s AI training now threatens to “financially ruin” the entire AI industry if up to 7 million claimants end up joining the litigation and forcing a settlement.
Last week, Anthropic petitioned to appeal the class certification, urging the court to weigh questions that the district court judge, William Alsup, seemingly did not. Alsup allegedly failed to conduct a “rigorous analysis” of the potential class and instead based his judgment on his “50 years” of experience, Anthropic said.
“If we have to pay for the intellectual property that we steal and repackage, our whole business model will be destroyed!”
One thing this whole AI training debacle has done for me: made me completely guilt-free in pirating things. Copyright law has been bullshit since Disney stuck their finger in it and if megacorps can get away with massively violating it, I’m not going to give a shit about violating it myself.
I’m pretty much there too, the whole industry consolidates on the new things and charges us as they make it worse. And there can be some arguments to be made over the benefits of AI but we all know that it will not be immune to the entshitification that has already ruined all the things before it
If I downloaded ten movies to watch with my nephew in the cancer ward, they’d sue me into oblivion. Download tens of millions of books and claiming your business model depends on doesn’t make it okay. And sharing movies with my sick nephew would cause less harm to society and to the environment than AI does.
I started my own streaming service with pirated content. My business model depends on that data on my server.
Same thing but for some reason it’s different. They hate when we use their laws against them. Let’s root they rule against this class action so we can all benefit from copyright being thrown out. Or alternatively it kills AI companies, either way is a win.
They hate when we use their laws against them
YSK. They, we and them in this sentence mean different things to different people.
They’re not stealing anything. Nor are they “repackaging” anything. LLMs don’t work like that.
I know a whole heck of a lot of people hate generative AI with a passion but let’s get real: The reason they hate generative AI isn’t because they trained the models using copyrighted works (which has already been ruled fair use; as long as the works were legitimately purchased). They hate generative AI because of AI slop and the potential for taking jobs away from people who are already having a hard time.
AI Slop sucks! Nobody likes it except the people making money from it. But this is not a new phenomenon! For fuck’s sake: Those of us who have been on the Internet for a while have been dealing with outsourced slop and hidden marketing campaigns/scams since forever.
The only difference is that now—thanks to convenient and cheap LLMs—scammers and shady marketers can generate bullshit at a fraction of the cost and really, really quickly. But at least their grammar is correct now (LOL @ old school Nigerian Prince scams).
It’s humans ruining things for other humans. AI is just a tool that makes it easier and cheaper. Since all the lawsuits and laws in the world cannot stop generative AI at this point, we might as well fix the underlying problems that enable this bullshit. Making big AI companies go away isn’t going to help with these problems.
In fact, it could make things worse! Because the development of AI certainly won’t stop. It will just move to countries with fewer scruples and weaker ethics.
The biggest problem is (mostly unregulated) capitalism. Fix that, and suddenly AI “taking away jobs” ceases to be a problem.
Hopefully, AI will force the world to move toward the Star Trek future. Because generating text and images is just the start.
When machines can do just about everything a human can (and scale up really fast)—even without AGI—there’s no future for capitalism. It just won’t work when there’s no scarcity other than land and energy.
I respectfully disagree. Meta was caught downloading books from Libgen, a piracy site, to “train” it’s models. What AI models do in effect is scan information (i.e., copy), and distill and retain what they view as its essence. They can copy your voice, they can copy your face, and they can copy your distinctive artistic style. The only way they can do that is if the “training” copies and retains a portion of the original works.
Consider Shepard Fairies’ use of the AP’s copyrighted Obama photograph in the production of the iconic “Hope” poster, and the resultant lawsuit. While the suit was ultimately settled, and the issue of “fair use” was a close call given the variation in art work from the original source photograph, the suit easily could have gone against Fairey, so it was smart for him to settle.
Also consider the litigation surrounding the use of music sampling in original hip hop works, which has clearly been held to be copyright infringement.
Accordingly, I think it is very fair to say that (1) AI steals copyrighted works; and (2) repackages the essential portions of those works into new works. Might a re-write of copyright law be in order to embrace this new technology? Sure, but if I’m a actor, or voice actor, author, or other artist and I can no longer earn a living because someone else has taken my work to strip it down to it’s essence to resell cheaply without compensating me, I’m going to be pretty pissed off.
Hopefully, AI will force the world to move toward the Star Trek future.
Lol. The liberal utopia of Star Trek is a fantasy. Far more likely is that AI will be exploited by oligarchs to enrich themselves and further impoverish the masses, as they are fervently working towards right now. See, AI isn’t creative, it gives the appearance of being creative by stealing work created by humans and repackaging it. When artists can no longer create art to survive, there will be less material for the AI models to steal, and we’ll be left with soulless AI slop as our de facto creative culture.
I respectfully disagree. Meta was caught downloading books from Libgen, a piracy site, to “train” it’s models.
That action itself can and should be punished. Yes. But that has nothing to do with AI.
What AI models do in effect is scan information (i.e., copy), and distill and retain what they view as its essence. They can copy your voice, they can copy your face, and they can copy your distinctive artistic style. The only way they can do that is if the “training” copies and retains a portion of the original works.
Is that what people think is happening? You don’t even have a layman’s understanding of this technology. At least watch a few videos on the topic.
I think that copying my voice makes this robot a T-1000, and T-1000s are meant to be dunked in lava to save Sarah Connor.
But that has nothing to do with AI.
Absurd. It’s their entire fucking business model.
Meaning it would illegal even if they weren’t doing anything with ai…
So what an Ai does is the same thing as every human ever who has read/saw/listened a work and then wrote more words being influenced by that book/artwork/piece.
If you’ve ever done anything artistic in your life, you know that the first step is to look at what others have done. Even subconsciously you will pull from what you’ve seen, heard. To say that AI is not creative because it is derivative is to to say that no human being in history has been creative.
You’re forgetting the fact that humans always add something of our own when we make art, even when we try to reproduce another’s artpiece as a study.
The many artists we might’ve looked at certainly influence our own styles, but they’re not the only thing that’s expressed in our artwork. Our life lived to that point, and how we’re feeling in the moment, those are also the things, often the point, that artists communicate when making art.
Most artists haven’t also looked at nearly every single work by almost every artist spanning a whole century of time. We also don’t need whole-ass data centers that need towns’ worth of water supply to just train to produce some knock-off, soulless amalgamation of other people’s art.
Look at what they need to mimic a fraction of our power.
You’re arguing the quality of what AI produces which has nothing to do with the legality of it.
What is the law? A joke or a myth for the poor?
My comment is replying to the guy talking about whether or not you can call AI ‘creative’ though.
You are damn right…
You are going to need to expand a little bit more on that notion that we add something of our own. Or more specifically explain how is that not the case for AI. They might not draw from personal experiences since they have none, but not every piece of human art necessarily draws from a person’s experiences.Or at least not in any way that it can even be articulated or meaningfully differentiated from an ai using as reference the lived experiences of another person.
Also look at all the soulless corporate art ie the art that AI is going to replace. Most of it has nothing of the author in it. It simply has the intention of selling. Like I’ve seen a lot of videogame concept art in my life, like 80% of it looks like it was made by the same person. Is that kind of “creativity” any better than what an AI can do? No, it isn’t. At all.
The kind of artists that are making great, unique art that brings something fresh to the table are in no risk of being replaced anytime soon.
Your argument would only be true if AI was making 1 of 1 reproductions of existing works, but that is not the case. It is simply using existing works to produce sentences or works that use a little bit of a piece of each, like making a collage. I fail to see how that is different from human creativity, honestly. I say this as a creative myself.
Your second argument is not really an argument against AI anymore than it is an argument against any tech really. Most technologies are inefficient at first. As time goes on and we look for ways to improve the tech they become more efficient. This is universally true for every technology, in fact I think technological advancement can be pretty much reduced to the progress of energy efficiency.
What I mean by adding something of our own is how art, in Cory Doctorow’s words, contain many acts of communicative intent. There are thousands of microdecisions a human makes when creating art. Whereas imagery generated only by the few words of a prompt to an LLM only contain that much communicative intent.
I feel like that’s why AI art always has that AI look and feel to it. I can only sense a tiny fraction of the person’s intent, and maybe it’s because I know the rest is filled in by the AI, but that is the part that feels really hollow or soulless to me.
Even in corporate art, I can at least sense what the artist was going for, based on corporate decisions to use clean, inoffensive designs for their branding and image. There’s a lot of communicative intent behind those designs.
I recommend checking the blog post I referenced, because Cory Doctorow expresses these thoughts far more eloquently than I do.
As for the latter argument, I wanted to highlight the fact that AI needs that level of resources and training data in order to produce art, whereas a human doesn’t, which shows you the power of creativity, human creativity. That’s why I think what AI does cannot be called ‘creativity.’ It cannot create. It does what we tell it to, without its own intent.
Cory’s take is excellent, thanks for bringing this up because it does highlight what I try to communicate to a lot of people: it’s a tool. It needs a human behind the wheel to produce anything good and the more effort the human puts into describing what it wants the better the result, because as Cory so eloquently puts it, it gets imbued with meaning. So I think my posture is now something like: AI is not creative by itself, it’s a tool to facilitate the communication of an idea that a human has in their heads and lacks the time or skill to communicate properly.
Now I don’t think this really answers our question of whether the mechanics of the AI synthesizing the information is materially different to how a human synthesizes information. Furthermore it is murkied more by the fact that the “creativity” of it is powered by a human.
Maybe it is a sliding scale? Which is actually sort of aligned with what I was saying, if AI is producing 1:1 reproductions then it is infringing rights. But if the prompt is one paragraph long, giving it many details about the image or paragraph/song/art/video etc, in such a way that it is unique because of the specificity achieved in the prompt, then it is clear that no only is the result a result of human creativity but also that it is merely using references in the same way a human does.
The way I think the concept is easier for me to explain is with music. If a user describes a song, its length, its bpm, every note and its pitch, would that not be an act of human creativity? In essence the song is being written by the human and the AI is simply “playing it” like when a composer writes music and a musician plays it. How creative is a human that is replaying a song 1:1 as it was written?
What if maybe LLMs came untrained and the end user was responsible for giving it the data? So any books you give it you must have owned, images etc. That way the AI is even more of an extension of you? Would that be the maximally IP respecting and ethical AI? Possibly but it puts too much of the burden on the user for it to be useful for 99% of the people. Also it shifts the responsibility in respects to IP infringement to the individual, something that I do not think anyone is too keen on doing.
He is more of an imitation and his work has no soul and pain and when you understand this, no matter how perfect the art is, if there is no person or story behind it about why and for what purpose this art was drawn, then it is just factory crap that cannot compare with real soul food.
Meta literally torrented an insane amount of training materials illegally, from a company that was sued into the ground and forced to dissolve because of distributing stolen content
“When machines can do just about everything a human can (and scale up really fast)—even without AGI—there’s no future for capitalism.”
This might be one of the dumbest things I’ve ever read.
There is a future, but it will be so soulless and false that those who know what real art is will feel disgust for it. It will no longer be a world but some kind of complete rotting swamp, although you won’t notice it with a consumerist eye.
It’s humans ruining things for other humans. AI is just a tool that makes it easier and cheaper
That’s the main point, though: the tire fire of humanity is bad enough without some sick fucks adding vast quantities of accelerant in order to maximize profits.
Since all the lawsuits and laws in the world cannot stop generative AI at this point
Clearly that’s not true. They’ll keep it up for as long as it’s at all positive to extract profits from it, but not past that. Handled right, this class action could make the entire concept poisonous from a profiteering perspective for years, maybe even decades.
we might as well fix the underlying problems that enable this bullshit.
Of COURSE! Why didn’t anyone think to turn flick off the switches marked “unscrupulous profiteering” and “regulatory capture”?!
We’ll have this done by tomorrow, Monday at the latest! 🙄
Making big AI companies go away isn’t going to help with these problems.
The cancer might be the underlying cause but the tumor still kills you if you don’t do anything about it.
the development of AI certainly won’t stop.
Again, it WILL if all profitability is removed.
It will just move to countries with fewer scruples and weaker ethics
Than silicon valley? Than the US government when ultra-rich white men want something?
No such country exists.
The biggest problem is (mostly unregulated) capitalism
Finally right about something.
Fix that, and suddenly AI “taking away jobs” ceases to be a problem.
“Discover the cure for cancer and suddenly the tumord in your brain, lungs, liver, and kidneys won’t be a problem anymore!” 🤦
Hopefully, AI will force the world to move toward the Star Trek future
Wtf have you been drinking??
Hopefully, AI will force the world to move toward the Star Trek future
It seems like this is another consumer, just ignore him, he is no longer a person but a zombie.
You’re seriously kidding yourself if you think China won’t continue to pursue AI even if the profit motive is lost in American companies. And if China continues to develop AI, so will the US even if it is nationalized or developed through military contracting, or even as a secret project, because it will be framed as a national security issue. So unless you find a way to stop China from also developing AI, the tech is here to stay no matter what happens.
Look these people are neck deep in a tribalistic worldview. You cannot reason with anyone who’s against AI simply because their claims are unfalsifiable and contradictory depending on the day and article they are reading. On the one hand it is the shitiest technology ever made which cannot do anything right and at the same time it is an existential threat to humanity.
And I can tell you, that the only reason this is the case is because the right is for AI more strongly than the left. If the right had condemned it, you can be damn right the tables would be turned and everyone who thinks they are left would be praising the tech.
Just move on and take solace in the fact that the technology simply cannot be rebottled, or uninvented. It exists, it is here to stay and literally no one can stop it at this point. And I agree with you, AI is the only tool that can provide true emancipation. It can also enslave. But the fact is that all tools can be used for right or wrong, so this is not inherent to AI.
“If we have to pay for the intellectual property that we steal and repackage, our whole business model will be destroyed!”
They are very likely to be civilly liable for uploading the books.
That’s largely irrelevant because the judge already ruled that using copyrighted material to train an LLM was fair use.
The judge did so in a summary motion, which means that they have to read all of the evidence in a manner most favorable to the plaintiff and they still decided that there is no way for the plaintiff to succeed in their copyright claim about training LLMs because it was so obviously fair use.
Read the Order, which is Exhibit B to Antrhopic’s appellate brief.
Anthropic admitted that they pirated millions of books like Meta did, in order to create a massive central library for training AI that they permanently retained, and now assert that if they are held responsible for this theft of IP it will destroy the entire AI industry. In other words, it appears that this is common practice in the AI industry to avoid the prohibitive cost of paying for the works they copy. Given that Meta, one of the wealthiest companies in the world, did the same exact thing, it reinforces the understanding that piracy to avoid paying for their libraries is a central component of training AI.
While the lower court did rule that training an LLM on copyrighted material was a fair use, it expressly did not rule that derivative works produced are protected by fair use and preserved the issue for further litigation:
Again, Authors concede that training LLMs did not result in any exact copies nor even infringing knockoffs of their works being provided to the public. If that were not so, this would be a different case. Authors remain free to bring that case in the future should such facts develop.
Emphasis added. In other words, Anthropic can still face liability if it’s trained AI produces knockoff works.
Finally, the Court held
The downloaded pirated copies used to build a central library were not justified by a fair use. Every factor points against fair use. Anthropic employees said copies of works (pirated ones, too) would be retained “forever” for “general purpose” even after Anthropic determined they would never be used for training LLMs. A separate justification was required for each use. None is even offered here except for Anthropic’s pocketbook and convenience. … We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages, actual or statutory (including for willfulness). That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages. Nothing is foreclosed as to any other copies flowing from library copies for uses other than for training LLMs.
Emphasis in original.
So to summarize, Anthropic apparently used the industry standard of piracy to build a massive book library to train it’s LLMs. Plaintiffs did not dispute that training an LLM on a copyrighted work is fair use, but did not have sufficient information to assert that knockoff works were produced by the trained LLMs, and the Court preserved that issue for later litigation if the plaintiffs sought to bring such a claim. Finally, the Court noted that Anthropic built it’s database for training it’s LLMs through massive straight-up piracy. I think my original comment was a fair assessment.
It looks, to me, like you’re reading the briefing without understanding how the legal system functions. You’re making some incredibly basic mistakes. Copyright violations and theft are two distinct legal concepts, for example. You’re treating the case summary as if it were the legal argument in the brief and you’re misinterpreting some pretty clear legal language written by the judge.
Anthropic admitted that they pirated millions of books like Meta did, in order to create a massive central library for training AI that they permanently retained, and now assert that if they are held responsible for this theft of IP it will destroy the entire AI industry.
No, that is not their argument.
Their legal argument, in the appeal of the class certification, is that the judge did not apply the required analysis in order to certify the three plaintiffs as being part of a class. He instead relied on his intuition, not any discovered facts or evidence. This isn’t allowed when analyzing a case for class certification.
In addition, Anthropic adds, it is well supported in case law (cited in the motion) that copyright claims are a bad fit for class action.
This is because copyright law focuses on individual works and each work has to be examined as to its eligibility for copyright protection, the standing of the plaintiff and if, and how much, of each individual work was the defendant responsible for violating copyright.
This can be done when 3 people claim a copyright violation, because they have a limited set of work which a court can reasonably examine.
A class action would require a court to consider hundreds or thousands of claimants and millions of individual works, each of which can be challenged individually by the defendant.
Courts typically don’t like to take on cases that can require millions of briefings, hearings and rulings. Because of this, courts usually always deny class action certification for copyright violations.
The court, in its order, did not address this or apply any of the required analysis. The class was certified based on vibes, something that doesn’t follow clearly established case law.
Authors concede that training LLMs did not result in any exact copies nor even infringing knockoffs of their works being provided to the public.
This is because training an LLM results in a language model.
A language model is in no way similar to a book and so training one is a transformative use of copyrighted material and protected under fair use.
Authors concede that training LLMs did not result in any exact copies nor even infringing knockoffs of their works being provided to the public. If that were not so, this would be a different case. Authors remain free to bring that case in the future should such facts develop.
In other words, Anthropic can still face liability if it’s trained AI produces knockoff works.
No, the judge didn’t make any claim about the model’s output after training. That isn’t an issue that’s being addressed in this case. You’re misunderstanding how judges address issues in writing.
Here, the judge is addressing a very narrow issue, specifically the exact claim made by the plaintiff (training with copyrighted material = copyright violation).
The subject of the paragraph is concerned with training the LLM. The claim by the plaintiff is that using copyrighted works to train LLMs is a violation of copyright. That’s what the judge is addressing.
The judge dismissed this argument because it was transformative and so protected by fair use.
The judge further noted that the plaintiffs did not show that training the LLM resulted in “any exact copies nor even infringing knockoffs of their works being provided to the public” and if they could show that training the LLM resulted in “any exact copies nor even infringing knockoffs of their works being provided to the public” then they could bring a case in the future. This is the judge hinting that they can amend their filings in this case to clarify their argument, if they had any evidence to support their claim.
The judge is telling the plaintiff that in order to succeed in their claim, which is that training an LLM on their work is a violation of their copyright, they need to show that the thing that they’re claiming has to result in copies of infringing material or knockoffs.
The training resulted in a model. Creating a model is transformative (a model and a book are two completely different things) and the plaintiffs didn’t show that any infringing works were produced by the training and therefore they have no way of succeeding with their argument that training the model violated their rights.
You’re reading a lot of extra into that statement that isn’t there. The plaintiffs never made a claim about the output of a trained model and so that argument wasn’t examined by the judge.
That’s unfair. They also have to sue people who infringe on “their” IP. You just don’t understand what it’s like to a content creator.
Hmm. I’m finding it hard to come up with more clever response to them than.
" good "
They have the entire public domain at their disposal.
If giant megacorporations didn’t want their chatbots talking like the 1920s, they shouldn’t have spent the past century robbing society of a robust public domain.
AI industry, fucking around: Woo! This is awesome! No consequences ever! Just endless profits!
AII, finding out: this fucking sucks! So unfair!
I’m no fan of the copyright fuckery so commonly employed by (amongst others) the RIAA and MPAA, but this is honestly the best use of copyright law I can think of in recent memory.
“copyright class action could ruin AI industry”
Oh nooooooo… How do I sign on to this lawsuit?
I would not hold my breath. There is a high likelihood that the courts will side AI companies because the American courts are compromised.
Or, according to the person who read the article, because the case is not what the title suggests it is?
Corps fight class actions, because individuals can’t sue on their own…
Copywrite holders have money and legal teams on call. Even if they stop this, they’ll just have a shit ton of lawsuits instead
I’ve read this entire thread and could not find a single person who seems to have actually read anything about this case.
The article is a huge pile of bullshit.
Here is what happened: An industry group filed an amicus brief during the appeal of a ruling where the judge certified the 3 plaintiffs as a class. Boring legal minutae in a case that doesn’t matter, see below.
The author is either incompetent at understanding legal filings or deliberately being misleading to write clickbait trash. Human slop, if you prefer.
This is not noteworthy, at all. The issue being argued about is if the 3 people can represent the class of “everyone Anthropic downloaded books from”. This is a non-story, unless you’re a legal nerd and care about exactly how courts define classes and the legal steps required for the analysis.
But, more importantly for the frothing anti-AI masses:
In the order certifying the defendants as a class, the judge dismissed the plaintiff’s claims of copyright violation related to the training of LLMs. The judge said that training LLMs was transformative and thus fair use under copyright law and since this is so obvious that that argument could be summarily dismissed.
Don’t believe me, go click on the links in the article to the summary judgement yourself. The information is not hard to find if you read farther than the headline.
The only remaining issue in the lawsuit is if Anthropic is civilly liable for downloading the books on bittorrent.
This case isn’t even about AI anymore, it’s the same kind of lawsuit that we’ve seen since Napster was popular. Uploading copyrighted material, like when you use BitTorrent, is a copyright violation and you could be sued.
That’s all this case is now, the argument that everyone is fighting over in the comments: “Is training an LLM on copyrighted material a violation of copyright?” is already answered by the judge:
No, using copyrighted material to train a LLM is so obviously fair use that the argument was summarily dismissed.
Here’s the relevant quote from the judge, in summary judgement:
To summarize the analysis that now follows, the use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use under Section 107 of the Copyright Act. The digitization of the books purchased in print form by Anthropic was also a fair use, but not for the same reason as applies to the training copies. Instead, it was a fair use because all Anthropic did was replace the print copies it had purchased for its central library with more convenient, space-saving, and searchable digital copies without adding new copies, creating new works, or redistributing existing copies. However, Anthropic had no entitlement to use pirated copies for its central library, and creating a permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy.
“Please let us steal” - AI Industry
Dude I think this is just the beginning and it will continue until AI can simulate anything the only way to fix it is as crazy as it sounds go back in time and threaten the end of the world to make the rulers of the world come to their senses otherwise I’m afraid there is no chance.
Wish him the best
Yeah, who the fuck gave all these rich assholes the right to make money on others’ work?
I’d like to know how these assholes get away with even training on GPL licensed code.
Well, they’ve always profited from other people’s labor, and now they think that our souls belong to them too. They’ve gotten completely brazen!
It’s like they took only part of the wheat from the peasants before, and then decided to take it all by force and cunning, down to the last grain lol. :3
Alsup allegedly failed to conduct a “rigorous analysis” of the potential class and instead based his judgment on his “50 years” of experience, Anthropic said.
The judge didn’t apply critical thinking but instead just did whatever was the most likely decision based on all the information it was fed in its training? That must be very inconvenient and it would be a shame if we had companies advertising exactly that as a world changing technology.
Voyager doesn’t allow me to insert gifs.