AI industry horrified to face largest copyright class action ever certified

Davriellelouna@lemmy.world · 4 months ago

AI industry horrified to face largest copyright class action ever certified

Null User Object@lemmy.world · 4 months ago

threatens to “financially ruin” the entire AI industry

No. Just the LLM industry and AI slop image and video generation industries. All of the legitimate uses of AI (drug discovery, finding solar panel improvements, self driving vehicles, etc) are all completely immune from this lawsuit, because they’re not dependent on stealing other people’s work.

A Wild Mimic appears!@lemmy.dbzer0.com · 4 months ago

But it would also mean that the Internet Archive is illegal, even tho they don’t profit, but if scraping the internet is a copyright violation, then they are as guilty as Anthropic.

☂️-@lemmy.ml · edit-2 3 months ago

deleted by creator

A Wild Mimic appears!@lemmy.dbzer0.com · 4 months ago

they should have done that long ago, and if they haven’t already started a backup in both europe and china, it’s high time

halcyoncmdr@lemmy.world · 4 months ago

As Anthropic argued, it now “faces hundreds of billions of dollars in potential damages liability at trial in four months” based on a class certification rushed at “warp speed” that involves “up to seven million potential claimants, whose works span a century of publishing history,” each possibly triggering a $150,000 fine.

So you knew what stealing the copyrighted works could result in, and your defense is that you stole too much? That’s not how that works.

zlatko@programming.dev · 4 months ago

Actually that usually is how it works. Unfortunately.

*Too big to fail" was probably made up by the big ones.

A Wild Mimic appears!@lemmy.dbzer0.com · 4 months ago

If scraping is illegal, so is the Internet Archive, and that would be an immense loss for the world.

PushButton@lemmy.world · 4 months ago

Let’s go baby! The law is the law, and it applies to everybody

If the “genie doesn’t go back in the bottle”, make him pay for what he’s stealing.

Zetta@mander.xyz · 4 months ago

The law absolutely does not apply to everybody, and you are well aware of that.

jsomae@lemmy.ml · 4 months ago

The law applies to everybody, but the law-makers change the laws to benefit certain people. And then trump pardons the rest lol.

A Wild Mimic appears!@lemmy.dbzer0.com · 4 months ago

This would mean the copyright holders like Disney are now the AI companies, because they have the content to train them. That’s even worse, man.

BussyCat@lemmy.world · 4 months ago

It’s not because they would only train on things they own which is an absolute tiny fraction of everything that everyone owns. It’s like complaining that a rich person gets to enjoy their lavish estate when the alternative is they get to use everybody’s home in the world.

A Wild Mimic appears!@lemmy.dbzer0.com · edit-2 4 months ago

do you know how much content disney has? go scrolling: https://en.wikipedia.org/wiki/List_of_assets_owned_by_the_Walt_Disney_Company e: that’s the tip of the iceberg, because if they band together with others from the MPAA & RIAA, they can suffocate the entire Movie, Book and Music world with it.

BussyCat@lemmy.world · 4 months ago

They have 0.2T in assets the world has around 660T in assets which as I said before is a tiny fraction. Obviously both hold a lot of assets that aren’t worthwhile to AI training such as theme parks but when you consider a single movie that might be worth millions or billions has the same benefit for AI training as another movie worth thousands. the amount of assets Disney owned is not nearly as relevant as you are making it out to be

GreenKnight23@lemmy.world · 4 months ago

good, then I can just ignore Disney instead of EVERYTHING else.

ShadowWalker@lemmy.world · 4 months ago

Until they charge people to use their AI.

It’ll be just like today except that it will be illegal for any new companies to try and challenge the biggest players.

GreenKnight23@lemmy.world · 4 months ago

why would I use their AI? on top of that, wouldn’t it be in their best interests to allow people to use their AI with as few restrictions as possible in order to maximize market saturation?

SugarCatDestroyer@lemmy.world · 4 months ago

I just remembered the movie where the genie was released from the bottle of a real genie, he turned the world into chaos by freeing his own kind, and if it weren’t for the power of the plot, I’m afraid people there would have become slaves or died out.

Although here it is already necessary to file a lawsuit for theft of the soul in the literal sense of the word.

kameecoding@lemmy.world · edit-2 4 months ago

The law is not the law. I am the law.

insert awesome guitar riff here

Reference: https://youtu.be/Kl_sRb0uQ7A

westingham@sh.itjust.works · 4 months ago

I was reading the article and thinking “suck a dick, AI companies” but then it mentions the EFF and ALA filed against the class action. I have found those organizations to be generally reputable and on the right side of history, so now I’m wondering what the problem is.

pelya@lemmy.world · 4 months ago

AI coding tools are using the exact same backends as AI fiction writing tools, so it would hurt the fledgling vibe coder profession (which according to proper software developers should not be allowed to exist at all).

A Wild Mimic appears!@lemmy.dbzer0.com · 4 months ago

The same goes for the Internet Archive - if scraping is illegal, than the Internet Archive is as well.

FauxLiving@lemmy.world · edit-2 4 months ago

An important note here, the judge has already ruled in this case that "using Plaintiffs’ works “to train specific LLMs [was] justified as a fair use” because “[t]he technology at issue was among the most transformative many of us will see in our lifetimes.” during the summary judgement order.

The plaintiffs are not suing Anthropic for infringing on their copyright, the court has already ruled that it was so obvious that they could not succeed with that argument that it could be dismissed. Their only remaining claim is that Anthropic downloaded the books from piracy sites using bittorrent

This isn’t about LLMs anymore, it’s a standard “You downloaded something on Bittorrent and made a company mad”-type case that has been going on since Napster.

Also, the headline is incredibly misleading. It’s ascribing feelings to an entire industry based on a common legal filing that is not by itself noteworthy. Unless you really care about legal technicalities, you can stop here.

The actual news, the new factual thing that happened, is that the Consumer Technology Association and the Computer and Communications Industry Association filed an Amicus Brief, in an appeal of an issue that Anthropic the court ruled against.

This is pretty normal legal filing about legal technicalities. This isn’t really newsworthy outside of, maybe, some people in the legal profession who are bored.

The issue was class certification.

Three people sued Anthropic. Instead of just suing Anthropic on behalf of themselves, they moved to be certified as class. That is to say that they wanted to sue on behalf of a larger group of people, in this case a “Pirated Books Class” of authors whose books Anthropic downloaded from the book piracy websites.

The judge ruled they can represent the class, Anthropic appealed the ruling. During this appeal an industry group filed an Amicus brief with arguments supporting Anthropic’s argument. This is not uncommon, The Onion famously filed an Amicus brief with the Supreme Court when they were about to rule on issues of parody. Like everything The Onion writes, it’s a good piece of satire: link

9point6@lemmy.world · 4 months ago

Probably would have been cheaper to license everything you stole, eh, Anthropic?

Treczoks@lemmy.world · 4 months ago

Well, theft has never been the best foundation for a business, has it?

While I completely agree that copyright terms are completely overblown, they are valid law that other people suffer under, so it is 100% fair to make them suffer the same. Or worse, as they all broke the law for commercial gain.

SugarCatDestroyer@lemmy.world · edit-2 4 months ago

Unfortunately, this will probably lead to nothing: in our world, only the poor seem to be punished for stealing. Well, corporations always get away with everything, so we sit on the couch and shout “YES!!!” for the fact that they are trying to console us with this.

Modern_medicine_isnt@lemmy.world · 4 months ago

This issue is not so cut and dry. The AI companies are stealing from other companies more than ftom individual people. Publishing companies are owned by some very rich people. And they want thier cut.

This case may have started out with authors, but it is mentioned that it could turn into publishing companies vs AI companies.

FauxLiving@lemmy.world · edit-2 4 months ago

People cheering for this have no idea of the consequence of their copyright-maximalist position.

If using images, text, etc to train a model is copyright infringement then there will NO open models because open source model creators could not possibly obtain all of the licensing for every piece of written or visual media in the Common Crawl dataset, which is what most of these things are trained on.

As it stands now, corporations don’t have a monopoly on AI specifically because copyright doesn’t apply to AI training. Everyone has access to Common Crawl and the other large, public, datasets made from crawling the public Internet and so anyone can train a model on their own without worrying about obtaining billions of different licenses from every single individual who has ever written a word or drawn a picture.

If there is a ruling that training violates copyright then the only entities that could possibly afford to train LLMs or diffusion models are companies that own a large amount of copyrighted materials. Sure, one company will lose a lot of money and/or be destroyed, but the legal president would be set so that it is impossible for anyone that doesn’t have billions of dollars to train AI.

People are shortsightedly seeing this as a victory for artists or some other nonsense. It’s not. This is a fight where large copyright holders (Disney and other large publishing companies) want to completely own the ability to train AI because they own most of the large stores of copyrighted material.

If the copyright holders win this then the open source training material, like Common Crawl, would be completely unusable to train models in the US/the West because any person who has ever posted anything to the Internet in the last 25 years could simply sue for copyright infringement.

JustARaccoon@lemmy.world · 4 months ago

In theory sure, but in practice who has the resources to do large scale model training on huge datasets other than large corporations?

FauxLiving@lemmy.world · edit-2 4 months ago

Distributed computing projects, large non-profits, people in the near future with much more powerful and cheaper hardware, governments which are interested in providing public services to their citizens, etc.

Look at other large technology projects. The Human Genome Project spent $3 billion to sequence the first genome but now you can have it done for around $500. This cost reduction is due to the massive, combined effort of tens of thousands of independent scientists working on the same problem. It isn’t something that would have happened if Purdue Pharma owned the sequencing process and required every scientist to purchase a license from them in order to do research.

LLM and diffusion models are trained on the works of everyone who’s ever been online. This work, generated by billions of human-hours, is stored in the Common Crawl datasets and is freely available to anyone who wants it. This data is both priceless and owned by everyone. We should not be cheering for a world where it is illegal to use this dataset that we all created and, instead, we are forced to license massive datasets from publishing companies.

The amount of progress on these types of models would immediately stop, there would be 3-4 corporations would could afford the licenses. They would have a de facto monopoly on LLMs and could enshittify them without worry of competition.

JustARaccoon@lemmy.world · 4 months ago

The world you’re envisioning would only have paid licenses, who’s to say we can’t have a “free for non commercial purposes” license style for it all?

barryamelton@lemmy.world · edit-2 4 months ago

Anybody can use copyrighted works under fair use for research, more so if your LLM model is open source (I would say this fair use should only actually apply if your model is open source…). You are wrong.

We don’t need to break copyright rights that protect us from corporations in this case, or also incidentally protect open source and libre software.

Cryptagionismisogynist@lemmy.world · 4 months ago

Copyright is a leftover mechanism from slavery and it will be interesting to see how it gets challenged here, given that the wealthy view AI as an extension of themselves and not as a normal employee. Genuinely think the copyright cases from AI will be huge.

FauxLiving@lemmy.world · edit-2 4 months ago

My last comment was wrong, I’ve read through the filings of the case.

The judge has already ruled that training the LLMs using the books was so obviously fair use that it was dismissed in summary judgement (my bolds):

To summarize the analysis that now follows, the use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use under Section 107 of the Copyright Act. The digitization of the books purchased in print form by Anthropic was also a fair use, but not for the same reason as applies to the training copies. Instead, it was a fair use because all Anthropic did was replace the print copies it had purchased for its central library with more convenient, space-saving, and searchable digital copies without adding new copies, creating new works, or redistributing existing copies. However, Anthropic had no entitlement to use pirated copies for its central library, and creating a permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy.

The only issue remaining in this case is that they downloaded copyrighted material with bittorrent, the kind of lawsuits that have been going on since napster. They’ll probably be required to pay for all 196,640 books that they priated and some other damages.

FauxLiving@lemmy.world · 4 months ago

deleted by creator

sunbytes@lemmy.world · 4 months ago

Or it just happens overseas, where these laws don’t apply (or can’t be enforced).

But I don’t think it will happen. Too many countries are desperate to be “the AI country” that they’ll risk burning whole industries to the ground to get it.

Deflated0ne@lemmy.world · 4 months ago

Good. Burn it down. Bankrupt them.

If it’s so “critical to national security” then nationalize it.

A Wild Mimic appears!@lemmy.dbzer0.com · 4 months ago

the “burn it down” variant would only lead to the scenario where the copyright holders become the AI companies, since they have the content to train it. AI will not go away, it might change ownership to someone worse tho.

nationalizing sounds better; even better were to put in under UNESCO-stewardship.

Deflated0ne@lemmy.world · 4 months ago

Hard to imagine worse than the insane techno-feudalists who currently own it.

A Wild Mimic appears!@lemmy.dbzer0.com · 4 months ago

believe me, Disney is fucking ruthless in comparison to Anthropic.

Lexam@lemmy.world · 4 months ago

No it won’t. Just their companies. Which are the ones making slop. If your AI does something actually useful it will survive.

A Wild Mimic appears!@lemmy.dbzer0.com · 4 months ago

You know, if they lose, their tech will probably become the property of copyright holders, which means your new AI Overlord has the first name Walt.

Ann Archy@lemmy.world · 4 months ago

I am holding my breath! Will they walk free, or get a $10 million fine and then keep doing what every other thieving, embezzling, looting, polluting, swindling, corrupting, tax evading mega-corporation have been doing for a century!

cmeu@lemmy.world · 4 months ago

Would be better if the fee were nominal, but that all their training data must never be used. Start them over from scratch and make it illegal to use anything that it knows now. Knee cap these frivolous little toys

hansolo@sh.itjust.works · 4 months ago

This is how corruption works - the fine is the cost of business. Being given only a fine of $10 million is such a win that they’ll raise $10 billion in new investment on its back.

Plurrbear@lemmy.world · 4 months ago

Fucking good!! Let the AI industry BURN!

A Wild Mimic appears!@lemmy.dbzer0.com · edit-2 4 months ago

So, the US now has a choice: rescue AI and fix their fucked up copyright system, or rescue the fucked up copyright system and fuck up AI companies. I’m interested in the decision.

I’d personally say that the copyright system needs to be fixed anyway, because it’s currently just a club for the RIAA&MPAA to wield against everyone (remember the lawsuits against single persons with alleged damages in the millions for downloading a few songs? or the current tries to fuck over the internet archive?). Should the copyright side win, then we can say goodbye to things like the internet archive or open source-AI; copyright holders will then be the AI-companies, since they have the content.

WereCat@lemmy.world · 4 months ago

We just need to show that ChatGPT and alike can generate Nintendo based content and let it fight out between them

Ann Archy@lemmy.world · 4 months ago

They will probably just merge into another mega-golem controlled by one of the seven people who own the planet.

potoooooooo ☑️@lemmy.world · 4 months ago

Mario, voiced by Chris Pratt, will become the new Siri, then the new persona for all AI.

In the future, all global affairs will be divided across the lines of Team Mario and Team Luigi. Then the final battle, then the end.

tetris11@feddit.uk · edit-2 4 months ago

*dabs, mournfully*