Pro@programming.dev to Technology@lemmy.worldEnglish · 13 days agoClockBench: Even the best AI models can't reliably read the clockclockbench.aiexternal-linkmessage-square7linkfedilinkarrow-up178file-textcross-posted to: Technology@programming.dev
arrow-up178external-linkClockBench: Even the best AI models can't reliably read the clockclockbench.aiPro@programming.dev to Technology@lemmy.worldEnglish · 13 days agomessage-square7linkfedilinkfile-textcross-posted to: Technology@programming.dev
minus-squareearthworm@sh.itjust.workslinkfedilinkEnglisharrow-up5·edit-213 days agoThis seems like a dumb benchmark. ClockBench evaluates whether models can read analog clocks - a task that is trivial for humans, but current frontier models struggle with. What do you mean trivial? Most humans I know can’t read the most basic white-background-big-black-numbers clocks. Someone rigged the jury to get 90% on this:
This seems like a dumb benchmark.
What do you mean trivial? Most humans I know can’t read the most basic white-background-big-black-numbers clocks.
Someone rigged the jury to get 90% on this: