Leaked list shows Facebook training their AI on multiple Lemmy instances

cm0002@lemmy.world · 4 months ago

Leaked list shows Facebook training their AI on multiple Lemmy instances

CMDR_Horn@lemmy.world · edit-2 4 months ago

Meta AI’s gonna go dong out tankie

FaceDeer@fedia.io · 4 months ago

Well, obviously. It’s an open protocol. I assumed everyone on the Fediverse must be pro-AI training, otherwise why post here? That would be dumb.

Empricorn@feddit.nl · 4 months ago

Uh… Are you saying simply using social media is endorsing stealing personal information to train LLMs? Because that’s a wild take, if so. Personally, I feel like there’s no stopping them, so what am I supposed to do, stay silent? Not engage with anything, even anonymously?

FaceDeer@fedia.io · 4 months ago

Uh… Are you saying simply using social media is endorsing stealing personal information to train LLMs?

Of course not. No “stealing” is happening. People are posting content on an open protocol that permits anyone to read it. Exactly as intended.

Personally, I feel like there’s no stopping them, so what am I supposed to do, stay silent?

If you do not want to be heard then yes, I suppose you could stay silent. That would indeed accomplish that.

You could also find a social media platform whose content is locked behind a walled garden of some sort that makes it more difficult for your posts to be seen by the public. But that’s antithetical to how the Fediverse works, you want someplace very different from here if that’s how you want to approach this.

Basically, you are on a platform that’s specifically designed to broadcast your comments far and wide without restriction, and then you’re getting upset that someone you didn’t want to hear your comments is hearing your comments. I’m not sure what you expected.

Feyd@programming.dev · 4 months ago

Participating in a public forum that has no technical way of preventing data from being used by a particular class of actor does not preclude having an opinion that a particular class of actor should have rules about what data they are allowed to use.

FaceDeer@fedia.io · 4 months ago

People can have whatever opinions they want to have. In this case that opinion flies in the face of obvious reality and I’m pointing that out.

It’s like trying to drive your car across the Atlantic ocean and then griping about how the car failed to stay above the water because you really thought it should be able to handle that.

Feyd@programming.dev · 4 months ago

It doesn’t matter how many pithy analogies you make. You need to recognize the difference between “I know they’re scraping this website because they can” and “I don’t think they should be allowed to scrape this website”. You’re arguing that they’re incompatible when they’re not.

FaceDeer@fedia.io · 4 months ago

As I said, people can have whatever opinion they want. Reality is under no obligation to respect those opinions.

Analogies are merely explanatory.

Feyd@programming.dev · 4 months ago

If you understand, then you should be able to understand that your “they were dressed like they wanted it” level argument bullshit is completely unnecessary.

Cid Vicious@sh.itjust.works · 4 months ago

It’s no different from posting on usenet back in the day. You have to assume it’s all completely public. People are gonna train their AIs on large public text repositories whether they’re given permission to or not.

resipsaloquitur@lemmy.world · 4 months ago

Fuck a Zuck.

This is fine🔥🐶☕🔥@lemmy.world · 4 months ago

Can’t wait for meta chatbot to tell zuckerberg to kill himself

ShadowRam@fedia.io · 4 months ago

I mean, think about it.

The majority of written communication are core focused on Drama.

People communicate online about subjects --> Drama News - Drama Books/Stories - Drama

So is anyone really surprised that you train an entity on only written text and it ends up being dramatic?

Blackmist@feddit.uk · 4 months ago

Sure, this is open data viewable by everyone.

Stands to reason that AI is being trained on it.

CameronDev@programming.dev · 4 months ago

So, duplicating their data? That seems counter-productive.

qaz@lemmy.world · 4 months ago

It seems counter productive for them to scrape it when the API is right there

Lee Duna@lemmy.nz · 4 months ago

I remember back then, some people defended not blocking Threads instances.

FaceDeer@fedia.io · 4 months ago

And one of those defenses was “it doesn’t matter if you block Threads, the underlying ActivityPub protocol is open and anyone who wants the data can still receive it.”

Turns out to be the case. It didn’t matter if you blocked Threads.

trashcan@sh.itjust.works · 4 months ago

I think a better reason is the federation only worked one way. Why should we share our content if they’re not sharing theirs? Not that we’d want it.

IceFoxX@lemmy.world · 4 months ago

Seriously? Meta uses many methods to snoop on the cell phone and with its functions it also looks for devices in the network in which you are logged in and also devices simply in the vicinity. It goes without saying that Meta makes use of open data… I would even go so far as to say that other AI models are not trained any differently. Well, they may be trained using an AI that has been trained on them so that they don’t have to access the data from the actual sources themselves.

Cocopanda@lemmy.world · 4 months ago

Oh ya? Suck the cuck should get a dildo up his arse.

pedz@lemmy.ca · 4 months ago

Is there an easy way to poison the input? Is there something we can slip in our comments that could make the data useless?

horse@feddit.org · edit-2 4 months ago

horseanimalsex.pro

lmao wtf is that list. Literally training their AI on beastiality.

Edit in case it’s not obvious: That domain is very much NSFW and it’s exactly what you’d expect (I checked and wish I hadn’t).

tatterdemalion@programming.dev · 4 months ago

horse@feddit.org

hmmm…