• Credibly_Human@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    3 hours ago

    Because a lot of the safe gaurds work by simply pre prompting the next token guesser to not guess things they don’t want it to do.

    Its in plain english using the “logic” of conversations, so the same vulnerabilities largely apply to those methods.