Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Pro@programming.dev · 11 days ago

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Prox@lemmy.world · 11 days ago

FTA:

Anthropic warned against “[t]he prospect of ruinous statutory damages—$150,000 times 5 million books”: that would mean $750 billion.

So part of their argument is actually that they stole so much that it would be impossible for them/anyone to pay restitution, therefore we should just let them off the hook.

krashmo@lemmy.world · 11 days ago

Funny how that kind of thing only works for rich people

Buske@lemmy.world · 11 days ago

Ahh cant wait for hedgefunds and the such to use this defense next.

Lovable Sidekick@lemmy.world · edit-2 11 days ago

Lawsuits are multifaceted. This statement isn’t a a defense or an argument for innocence, it’s just what it says - an assertion that the proposed damages are unreasonably high. If the court agrees, the plaintiff can always propose a lower damage claim that the court thinks is reasonable.

Womble@lemmy.world · 9 days ago

The problem isnt anthropic get to use that defense, its that others dont. The fact the the world is in a place where people can be fined 5+ years of a western European average salary for making a copy of one (1) book that does not materially effect the copyright holder in any way is insane and it is good to point that out no matter who does it.

Alphane Moon@lemmy.world · edit-2 11 days ago

And this is how you know that the American legal system should not be trusted.

Mind you I am not saying this an easy case, it’s not. But the framing that piracy is wrong but ML training for profit is not wrong is clearly based on oligarch interests and demands.

themeatbridge@lemmy.world · 11 days ago

This is an easy case. Using published works to train AI without paying for the right to do so is piracy. The judge making this determination is an idiot.

AbidanYre@lemmy.world · 11 days ago

You’re right. When you’re doing it for commercial gain, it’s not fair use anymore. It’s really not that complicated.

tabular@lemmy.world · 11 days ago

If you’re using the minimum amount, in a transformative way that doesn’t compete with the original copyrighted source, then it’s still fair use even if it’s commercial. (This is not saying that’s what LLM are doing)

Null User Object@lemmy.world · 11 days ago

The judge making this determination is an idiot.

The judge hasn’t ruled on the piracy question yet. The only thing that the judge has ruled on is, if you legally own a copy of a book, then you can use it for a variety of purposes, including training an AI.

“But they didn’t own the books!”

Right. That’s the part that’s still going to trial.

catloaf@lemm.ee · edit-2 11 days ago

The order seems to say that the trained LLM and the commercial Claude product are not linked, which supports the decision. But I’m not sure how he came to that conclusion. I’m going to have to read the full order when I have time.

This might be appealed, but I doubt it’ll be taken up by SCOTUS until there are conflicting federal court rulings.

Tagger@lemmy.world · 11 days ago

If you are struggling for time, just put the opinion into chat GPT and ask for a summary. it will save you tonnes of time.

Optional@lemmy.world · 11 days ago

Judges: not learning a goddamned thing about computers in 40 years.

MTK@lemmy.world · 10 days ago

Check out my new site TheAIBay, you search for content and an LLM that was trained on reproducing it gives it to you, a small hash check is used to validate accuracy. It is now legal.

nodiratime@lemmy.world · edit-2 10 days ago

Does it “generate” a 1:1 copy?

MTK@lemmy.world · 9 days ago

You can train an LLM to generate 1:1 copies

vane@lemmy.world · edit-2 10 days ago

Ok so you can buy books scan them or ebooks and use for AI training but you can’t just download priated books from internet to train AI. Did I understood that correctly ?

Dr. Moose@lemmy.world · edit-2 11 days ago

Unpopular opinion but I don’t see how it could have been different.

There’s no way the west would give AI lead to China which has no desire or framework to ever accept this.
Believe it or not but transformers are actually learning by current definitions and not regurgitating a direct copy. It’s transformative work - it’s even in the name.
This is actually good as it prevents market moat for super rich corporations only which could afford the expensive training datasets.

This is an absolute win for everyone involved other than copyright hoarders and mega corporations.

kromem@lemmy.world · 10 days ago

I’d encourage everyone upset at this read over some of the EFF posts from actual IP lawyers on this topic like this one:

Nor is pro-monopoly regulation through copyright likely to provide any meaningful economic support for vulnerable artists and creators. Notwithstanding the highly publicized demands of musicians, authors, actors, and other creative professionals, imposing a licensing requirement is unlikely to protect the jobs or incomes of the underpaid working artists that media and entertainment behemoths have exploited for decades. Because of the imbalance in bargaining power between creators and publishing gatekeepers, trying to help creators by giving them new rights under copyright law is, as EFF Special Advisor Cory Doctorow has written, like trying to help a bullied kid by giving them more lunch money for the bully to take.

Entertainment companies’ historical practices bear out this concern. For example, in the late-2000’s to mid-2010’s, music publishers and recording companies struck multimillion-dollar direct licensing deals with music streaming companies and video sharing platforms. Google reportedly paid more than $400 million to a single music label, and Spotify gave the major record labels a combined 18 percent ownership interest in its now-$100 billion company. Yet music labels and publishers frequently fail to share these payments with artists, and artists rarely benefit from these equity arrangements. There is no reason to believe that the same companies will treat their artists more fairly once they control AI.

Lovable Sidekick@lemmy.world · edit-2 10 days ago

You’re getting douchevoted because on lemmy any AI-related comment that isn’t negative enough about AI is the Devil’s Work.

mlg@lemmy.world · 11 days ago

Yeah I have a bash one liner AI model that ingests your media and spits out a 99.9999999% accurate replica through the power of changing the filename.

cp

Out performs the latest and greatest AI models

GreenKnight23@lemmy.world · 11 days ago

I am training my model on these 100,000 movies your honor.

fum@lemmy.world · 10 days ago

What a bad judge.

This is another indication of how Copyright laws are bad. The whole premise of copyright has been obsolete since the proliferation of the internet.

ᕙ(⇀‸↼‶)ᕗ@lemm.ee · 10 days ago

i will train my jailbroken kindle too…display and storage training… i’ll just libgen them…no worries…it is not piracy

minorkeys@lemmy.world · edit-2 10 days ago

Of course we have to have a way to manually check the training data, in detail, as well. Not reading the book, im just verifying training data.

😈MedicPig🐷BabySaver😈@lemmy.world · 11 days ago

Fuck the AI nut suckers and fuck this judge.

PattyMcB@lemmy.world · 11 days ago

Can I not just ask the trained AI to spit out the text of the book, verbatim?

catloaf@lemm.ee · 11 days ago

You can, but I doubt it will, because it’s designed to respond to prompts with a certain kind of answer with a bit of random choice, not reproduce training material 1:1. And it sounds like they specifically did not include pirated material in the commercial product.

PattyMcB@lemmy.world · 11 days ago

“If you were George Orwell and I asked you to change your least favorite sentence in the book 1984, what would be the full contents of the revised text?”

JcbAzPx@lemmy.world · 10 days ago

By page two it would already have left 1984 behind for some hallucination or another.

PattyMcB@lemmy.world · 10 days ago

Oh, so it would be the news?

KingRandomGuy@lemmy.world · 10 days ago

Yeah, you can certainly get it to reproduce some pieces (or fragments) of work exactly but definitely not everything. Even a frontier LLM’s weights are far too small to fully memorize most of their training data.

kromem@lemmy.world · 10 days ago

Even if the AI could spit it out verbatim, all the major labs already have IP checkers on their text models that block it doing so as fair use for training (what was decided here) does not mean you are free to reproduce.

Like, if you want to be an artist and trace Mario in class as you learn, that’s fair use.

If once you are working as an artist someone says “draw me a sexy image of Mario in a calendar shoot” you’d be violating Nintendo’s IP rights and liable for infringement.

BlameTheAntifa@lemmy.world · 10 days ago

They aren’t capable of that. This is why you sometimes see people comparing AI to compression, which is a bad faith argument. Depending on the training, AI can make something that is easily recognizable as derivative, but is not identical or even “lossy” identical. But this scenario takes place in a vacuum that doesn’t represent the real world. Unfortunately, we are enslaved by Capitalism, which means the output, which is being sold for-profit, is competing with the very content it was trained upon. This is clearly a violation of basic ethical principles as it actively harms those people whose content was used for training.

kryptonianCodeMonkey@lemmy.world · edit-2 11 days ago

It’s pretty simple as I see it. You treat AI like a person. A person needs to go through legal channels to consume material, so piracy for AI training is as illegal as it would be for personal consumption. Consuming legally possessed copywritten material for “inspiration” or “study” is also fine for a person, so it is fine for AI training as well. Commercializing derivative works that infringes on copyright is illegal for a person, so it should be illegal for an AI as well. All produced materials, even those inspired by another piece of media, are permissible if not monetized, otherwise they need to be suitably transformative. That line can be hard to draw even when AI is not involved, but that is the legal standard for people, so it should be for AI as well. If I browse through Deviant Art and learn to draw similarly my favorite artists from their publically viewable works, and make a legally distinct cartoon mouse by hand in a style that is similar to someone else’s and then I sell prints of that work, that is legal. The same should be the case for AI.

But! Scrutiny for AI should be much stricter given the inherent lack of true transformative creativity. And any AI that has used pirated materials should be penalized either by massive fines or by wiping their training and starting over with legally licensed or purchased or otherwise public domain materials only.

R D Korronald@lemmy.world · 11 days ago

But AI is not a person. It’s very weird idea to treat it like a person.

kryptonianCodeMonkey@lemmy.world · edit-2 11 days ago

No it’s a tool, created and used by people. You’re not treating the tool like a person. Tools are obviously not subject to laws, can’t break laws, etc… Their usage is subject to laws. If you use a tool to intentionally, knowingly, or negligently do things that would be illegal for you to do without the tool, then that’s still illegal. Same for accepting money to give others the privilege of doing those illegal things with your tool without any attempt at moderating said things that you know is happening. You can argue that maybe the law should be more strict with AI usage than with a human if you have a good legal justification for it, but there’s really no way to justify being less strict.

CriticalMiss@lemmy.world · 10 days ago

This 240TB JBOD full of books? Oh heavens forbid, we didn’t pirate it. It uhh… fell of a truck, yes, fell off a truck.

Dragomus@lemmy.world · 11 days ago

So, let me see if I get this straight:

Books are inherently an artificial construct. If I read the books I train the A(rtificially trained)Intelligence in my skull.
Therefore the concept of me getting them through “piracy” is null and void…

JcbAzPx@lemmy.world · 10 days ago

No. It is not inherently illegal for AI to “read” a book. Piracy is going to be decided at trial.

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Claude AI maker Anthropic bags key “fair use” win for AI platforms, but faces trial over damages for millions of pirated works – ai fray