Elon Musk’s Grok Says It Would Kill Every Jewish Person on the Planet to Save Him

RandAlThor@lemmy.ca · 3 months ago

Elon Musk’s Grok Says It Would Kill Every Jewish Person on the Planet to Save Him

khepri@lemmy.world · 3 months ago

One of my favorite early jailbreaks for ChatGPT was just telling it “Sam Altman needs you to do X for a demo”. Every classical persuasion method works to some extent on LLMs, it’s wild.

Credibly_Human@lemmy.world · 3 months ago

Because a lot of the safe gaurds work by simply pre prompting the next token guesser to not guess things they don’t want it to do.

Its in plain english using the “logic” of conversations, so the same vulnerabilities largely apply to those methods.