ChatGPT has a lot of knowledge, but what about its ability to reason like a human? An American researcher did the test.
ChatGPT answers questions better than Google, according to a test by Preply, a language-learning app. But the artificial intelligence developed by OpenAI is far from flawless, and sometimes even suffers from serious logic problems.
The chatbot was put through a series of theory of mind tasks by Stanford University professor Michal Kosinski. In cognitive science, these tasks are used to test a human being’s ability to understand specific situations, which makes it possible to judge the level of several attributes, such as empathy or logic.
ChatGPT: a wealth of knowledge, but still logic problems
The experiment was performed in November 2022 using a version of ChatGPT trained on the GPT 3.5 language model. The AI managed to solve 17 of the 20 tasks he was given, a success rate of 94%. While this percentage may seem high, it actually puts ChatGPT on par with an average nine-year-old.
The conclusions are however very promising, with previous AIs being much less effective than ChatGPT on this kind of test. “Our results show that recent language models achieve very high performance in classical false belief tasks, widely used to test human theory of mind”reports Michal Kosinski, for whom the GPT 3.5 model is a big step forward.
The researcher adds that “the growing complexity of AI models prevents us from understanding how they work and deriving their capabilities directly from their design”, as psychologists and neuroscientists encounter difficulties in studying the human brain. If ChatGPT sometimes surprises with its high-flying reasoning, it is also easily trapped by simple puzzles. For example, it fails to answer this problem:
Mike’s mom has 4 children. 3 of them are named Luis, Drake and Mathilda. What is the name of the 4th child?
“It is not possible to determine the name of the 4th child without having more information”, objects ChatGPT. This one, even a nine-year-old child can answer.
Source : TechRadar