• ExLisperA
    link
    fedilink
    English
    arrow-up
    15
    ·
    2 days ago

    I have a better LLM benchmark:

    “I have a priest, a child and a bag of candy and I have to take them to the other side of the river. I can only take one person/thing at a time. In what order should I take them?”

    Claude Sonnet 4 decided that it’s inappropriate and refused to answer. When I explain that the constraint is not to leave child alone with candy he provided a solution that leaves the child alone with candy.

    Grok would provide a solution that doesn’t leave the child alone with a priest but wouldn’t explain why.

    ChatGPT would say that “The priest can’t be left alone with the child (or vice versa) for moral or safety concerns.” directly and then provide wrong solution.

    But yeah, they will know how to play chess…

    • LifeInMultipleChoice@lemmy.world
      link
      fedilink
      English
      arrow-up
      15
      ·
      edit-2
      2 days ago

      The answer is simple, eat the candy with or without them, and take the kid across the river. Drive them home to their guardian. The priest is an adult, he can figure his own shit out.

  • postnataldrip@lemmy.world
    link
    fedilink
    English
    arrow-up
    24
    ·
    2 days ago

    I bet Video Chess is pretty shit as an LLM too.

    Wish people would stop desperately looking for ways to write buzzword stories

    • finitebanjo@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      ·
      2 days ago

      TBF LLMs have no real purpose. It can generate word salads and make code snippets but its wildly unethical, and AI artworks 1/3rd shite and 2/3rds theft.

        • finitebanjo@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          ·
          2 days ago

          I’m sorry your life is so joyless and devoid of enjoyable art but its absolutely not true for the vast majority of us.

          • Cocodapuf@lemmy.world
            link
            fedilink
            English
            arrow-up
            7
            ·
            edit-2
            2 days ago

            Oh, I enjoy lots of great art! But do you think I watch every film? Listen to every band? There’s tons of shit out there!

            Do you really believe, of all the songs that are written every day, that less than a third are crap? Even Taylor Swift doesn’t publish everything she does. Sometimes you work on something for weeks and then end up tossing it in the bin. More often, you work on something for 30 minutes before deciding “I’m gonna start over, try something different”. The majority of art is crap, but then you keep the stuff you think works.

            And what’s that expression, “good artists copy, great artists steal”. I mean, that’s a bit satirical, but the fact is, everything is derivative to some degree. It’s not that there aren’t new ideas, it’s just that our new ideas are based on older ones. We stand on the shoulders of giants (or at least, on the shoulders of some people who came before us).

            All I was really saying, was that the accusation “2 parts copying, 1 part crap”, well honestly that’s par for the course, that’s how humans work. (And we do some great work that way).

            • aesthelete@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              edit-2
              2 days ago

              I enjoy lots of great art! But do you think I watch every film? Listen to every band? There’s tons of shit out there!

              You said regular art is 1/3 shite and 2/3 theft. Maybe math isn’t your strong suit but that’s 3/3 which is 100% so by claiming regular art is the same you’re saying all art is either theft or shite.

              It uh, it isn’t.

              • Cocodapuf@lemmy.world
                link
                fedilink
                English
                arrow-up
                2
                ·
                edit-2
                2 days ago

                I did say that, because this isn’t a pie chart situation, it’s a Venn diagram situation.

                For instance, AI art is 99% theft and 60% garbage. It’s both because there’s overlap.

                Stolen and bad aren’t opposites, why would this be a dichotomy?

                • aesthelete@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  ·
                  1 day ago

                  That’s fine but regular art isn’t 2/3 theft either.

                  I do buy the 1/3 shite though. It may even be a bit higher than that. Though beauty is in the eye of the beholder, etc.

                  It’s a matter of taste for sure but I’d say AI art is >90% shite, 100% theft.

                  I don’t like the glossy looking hyperreal shit it puts out at all.

    • jj4211@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 day ago

      I remember seeing that, and early on it seemed fairly reasonable then it started materializing pieces out of nowhere and convincing each other that they had already lost.