A large language model shouldn’t even attempt to do math imo. They made an expensive hammer that is semi good at one thing (parroting humans) and now they’re treating every query like it’s a nail.
Why isn’t OpenAi working more modular whereby the LLM will call up specialized algorithms once it has identified the nature of the question? Or is it already modular and they just suck at anything that cannot be calibrated purely with brute force computing?
Yep. Instead of focusing on humans communicating more effectively with computers, which are good at answering questions that have correct, knowable answers, we’ve invented a type of computer that can be wrong because maybe people will like the vibes more? (And we can sell vibes)
In fairness the example I’ve seen MS give was taking a bunch of reviews and determining if the review was positive or negative from the text.
It was never meant to mangle numbers, but we all know it’s going to be used for that anyway, because people still want to believe in a future where robots help them, rather than just take their jobs and advertise to them.
I would rather not have it attempt something that it can’t do, no direct result is better than a wrong result imo. Here it’s correctly identifying that it’s a calculation question and instead of suggesting using a formula, it tries to hallucinate a numerical answer itself. The creators of the model seem to have a mindset that the model must try to answer no matter what, instead of training it to not answer questions that it can’t answer correctly.
As far as I can tell, the copilot command has to be given a range of data to work with, so here it’s pulling a number out of thin air. Be nice if the output from this was just “please tell this command which data to use” but as always it doesn’t know how to say “I don’t know”…
Mostly because it never knew anything to start with.
So it’s going to write python functions to calculate the answer where all the variables are stored in an Excel spreadsheet a program that can already do the calculations? And how many forests did we burn down for that wonderful piece of MacGyvered software I wonder.
I’m wondering if this is a sleight-of-hand trick by the poster, then.
If they typed the “1” field to Text and left the 2 and 3 as numeric, then ran Copilot on that. In that case, its more an indictment of Excel than Copilot, strictly speaking. The screen doesn’t make clear which cells are numbers and which are text.
I don’t think there’s an explanation that doesn’t make this copilot’s fault. Honestly JavaScript shouldn’t allow math between numbers and strings in the first place. “1” + 1 is not a number, and there’s already a type for that: NaN
Regardless, the sum should be 5 if the first cell is text, so it’s incorrect either way.
Honestly JavaScript shouldn’t allow math between numbers and strings in the first place.
You can explicitly type values in more recent versions of JavaScript to avoid this, if you really don’t want to let concatenation be the default. So, again, this feels like an oversight in integration rather than strict formal logic.
LLMs have already become this weird mesh of different services tied together to look more impressive. OpenAIs models can’t do math and farm it out to python for accuracy.
the LLM will call up specialized algorithms once it has identified the nature of the question?
Because there is no “it” that “calls” or “identify” even less the “nature of the question”.
This requires intelligence, not probability on the most likely next token.
You can do maths with words and you can write poem with numbers, either requires to actually understand, that’s the linchpin, rather than parse and provide an answer based on what has been written so far.
Sure the model might string together tokens that sound very “appropriate” to the question, in the sense that it fits within the right vocabulary, and if its dataset the occurrence was just frequent enough it even be correct, but that’s still not understanding even 1 single word (or token) within either the question or the corpus.
An llm can, by understanding some features of the input, predict a categorisation of the input and feed it to do different processors. This already works. It doesn’t require anything beyond the capabilities llms actually have, and isn’t perfect.
It’s a very good question why this hasn’t happened here; an llm can very reliably give you the answer to “how do you sum some data in python” so it only needs to be able to do that in excel and put the result into the cell.
There are still plenty of pitfalls. This should not be one of them, so that’s interesting.
I hadn’t heard of that protocol before, thanks. It holds a lot of promise for the future, but getting the input right for the tools is probably quite the challenge. And it also scares me because I now expect more companies to release irresponsible integrations.
A large language model shouldn’t even attempt to do math imo. They made an expensive hammer that is semi good at one thing (parroting humans) and now they’re treating every query like it’s a nail.
Why isn’t OpenAi working more modular whereby the LLM will call up specialized algorithms once it has identified the nature of the question? Or is it already modular and they just suck at anything that cannot be calibrated purely with brute force computing?
Yep. Instead of focusing on humans communicating more effectively with computers, which are good at answering questions that have correct, knowable answers, we’ve invented a type of computer that can be wrong because maybe people will like the vibes more? (And we can sell vibes)
In fairness the example I’ve seen MS give was taking a bunch of reviews and determining if the review was positive or negative from the text.
It was never meant to mangle numbers, but we all know it’s going to be used for that anyway, because people still want to believe in a future where robots help them, rather than just take their jobs and advertise to them.
I would rather not have it attempt something that it can’t do, no direct result is better than a wrong result imo. Here it’s correctly identifying that it’s a calculation question and instead of suggesting using a formula, it tries to hallucinate a numerical answer itself. The creators of the model seem to have a mindset that the model must try to answer no matter what, instead of training it to not answer questions that it can’t answer correctly.
As far as I can tell, the copilot command has to be given a range of data to work with, so here it’s pulling a number out of thin air. Be nice if the output from this was just “please tell this command which data to use” but as always it doesn’t know how to say “I don’t know”…
Mostly because it never knew anything to start with.
OpenAI already makes it write Python functions to do the calculations.
So it’s going to write python functions to calculate the answer where all the variables are stored in an Excel spreadsheet a program that can already do the calculations? And how many forests did we burn down for that wonderful piece of MacGyvered software I wonder.
The AI bubble cannot burst soon enough.
Too Big To Fail. Sorry. 1+2+3=15 now.
“1”+(2+3) is “15” in JavaScript.
I’m wondering if this is a sleight-of-hand trick by the poster, then.
If they typed the “1” field to Text and left the 2 and 3 as numeric, then ran Copilot on that. In that case, its more an indictment of Excel than Copilot, strictly speaking. The screen doesn’t make clear which cells are numbers and which are text.
I don’t think there’s an explanation that doesn’t make this copilot’s fault. Honestly JavaScript shouldn’t allow math between numbers and strings in the first place. “1” + 1 is not a number, and there’s already a type for that: NaN
Regardless, the sum should be 5 if the first cell is text, so it’s incorrect either way.
You can explicitly type values in more recent versions of JavaScript to avoid this, if you really don’t want to let concatenation be the default. So, again, this feels like an oversight in integration rather than strict formal logic.
This is the right answer.
LLMs have already become this weird mesh of different services tied together to look more impressive. OpenAIs models can’t do math and farm it out to python for accuracy.
Because there is no “it” that “calls” or “identify” even less the “nature of the question”.
This requires intelligence, not probability on the most likely next token.
You can do maths with words and you can write poem with numbers, either requires to actually understand, that’s the linchpin, rather than parse and provide an answer based on what has been written so far.
Sure the model might string together tokens that sound very “appropriate” to the question, in the sense that it fits within the right vocabulary, and if its dataset the occurrence was just frequent enough it even be correct, but that’s still not understanding even 1 single word (or token) within either the question or the corpus.
An llm can, by understanding some features of the input, predict a categorisation of the input and feed it to do different processors. This already works. It doesn’t require anything beyond the capabilities llms actually have, and isn’t perfect. It’s a very good question why this hasn’t happened here; an llm can very reliably give you the answer to “how do you sum some data in python” so it only needs to be able to do that in excel and put the result into the cell.
There are still plenty of pitfalls. This should not be one of them, so that’s interesting.
They have (well, anthropic)
It’s called MCP
Nowadays you just give the AI access to a calculator basically… or whatever other tools it needs. Including other models to help it answer something.
You have to admire the gall of naming your system after the evil AI villain of the original Tron movie.
The torment nexus strikes again.
I hadn’t heard of that protocol before, thanks. It holds a lot of promise for the future, but getting the input right for the tools is probably quite the challenge. And it also scares me because I now expect more companies to release irresponsible integrations.
Because we vibes-coded the OpenAI model with OpenAI and it didn’t think this was the optimal way to design itself.