Yeah it’s just token prediction all the way down. Asking it repeatedly to not do something might have even made it more likely to predict tokens that would do that thing.
Oh ha ha, so it’s like a toddler? You have to be very careful not to tell toddlers NOT to do a thing, because they will definitely do that thing. “Don’t touch the hot pan.” Toddler touches the hot pan.
The theory is that they don’t hear the word “don’t”, just the subsequent command. My theory is that the toddler brain goes, “why?” and proceeds to run a test to find out.
Yeah it’s just token prediction all the way down. Asking it repeatedly to not do something might have even made it more likely to predict tokens that would do that thing.
Oh ha ha, so it’s like a toddler? You have to be very careful not to tell toddlers NOT to do a thing, because they will definitely do that thing. “Don’t touch the hot pan.” Toddler touches the hot pan.
The theory is that they don’t hear the word “don’t”, just the subsequent command. My theory is that the toddler brain goes, “why?” and proceeds to run a test to find out.
In either scenario, screaming ensues.