CAVOK@lemmy.world to Technology@lemmy.worldEnglish · 28 days agoDonating our open-source alignment tool - Anthropicwww.anthropic.comexternal-linkmessage-square1linkfedilinkarrow-up120
arrow-up120external-linkDonating our open-source alignment tool - Anthropicwww.anthropic.comCAVOK@lemmy.world to Technology@lemmy.worldEnglish · 28 days agomessage-square1linkfedilink
minus-squareEm Adespoton@lemmy.calinkfedilinkEnglisharrow-up7·28 days agoThat’s all great, but all it takes is to unalign a single parameter and it appears to unalign the entire model. So this is great for ensuring you’re testing what you think you’re testing, but it’s not going to actually secure a model you’re going to make open.
That’s all great, but all it takes is to unalign a single parameter and it appears to unalign the entire model.
So this is great for ensuring you’re testing what you think you’re testing, but it’s not going to actually secure a model you’re going to make open.