Original title: “GPT-4 actually has a body, 167cm! Tsinghua University and Beijing Normal University’s heavy research: ChatGPT can perceive actions like people”
Is the world seen by ChatGPT the same as the world perceived by humans?
The language ability of ChatGPT is really amazing, but can a large language model perceive the real world like a human without a human body and lack of practical experience?
Recently, researchers from Tsinghua University and Beijing Normal University tested ChatGPT’s ability to perceive the world.
The study found that based on object affordance (affordance), that is, all possible actions that objects can provide to organisms, human subjects can divide objects of different sizes in the world into two categories, and the criteria for dividing these two categories are just based on their Body size is bounded.
Interestingly, ChatGPT, a large-scale language model that lacks an actual body, also exhibits similar affordance bounds on object-action associations and fits human body sizes.
In other words, ChatGPT can learn representations about objects in the world through language!
Paper link:https://www.biorxiv.org/ content/10.1101/2023.03.20.533336v3
Altogether, this study advances the understanding of the role of body size in shaping representations of objects, highlighting the role and direction of embodied cognition in understanding how intelligence emerges.
It is better to travel thousands of miles than to read thousands of books
Our body is not just a container for our thinking, it is also thinking itself – with the body, we can interact with objects in the world to perceive the whole world.
Imagine, for a palm-sized cylindrical container, we can use it to hold water and drink, this container is called a “cup”; but when this container gradually becomes larger and reaches the size of the body, we can sit in it and soak Bath, correspondingly, this container becomes a “bathtub”.
In this example, the objects are the same shape, but because they are different sizes relative to our bodies, we perceive and interact with them differently.
Further, this way of interaction can be changed – if we become the giants in “Gulliver’s Travels”, the original “bathtub” may become a “cup” for us giants.
This kind of sensory and motor function system that operates according to the self-referential intention is called “body schema”. We achieve cognitive embodiment through body schemas.
The ancient Greek philosopher Protagoras once said: “Man is the measure of all things.” That is to say, our body is a ruler to measure all things.
The ancient Roman philosopher further explained: “Nature puts us at the center of the universe, so that we can sweep the universe with our eyes. She not only creates human beings in an upright posture, but also puts human heads in order to make people suitable for contemplating herself. placed on top of the body, on a neck that bends easily, so that it can follow the rising and setting of the stars, and change the direction of the face with the whole rotating sky.” That is, our bodies are so grown. , because the universe is like this.
Body schema also plays an important role in normal social interaction, which is the core of human-computer interaction and user experience. Take, for example, the use of affordability described by Donald A. Norman in The Design of Everyday Things (translated: Design Psychology).
By considering users’ body schemas and behavioral expectations, designers can create products and environments that are more in line with users’ cognitive and interaction habits.
This design approach focusing on body schema and affordance can improve the usability of the product, allowing users to interact with it naturally and achieve a better user experience.
And this is one of the foundations of Apple.
ChatGPT: My height is 167.6
The large language model represented by ChatGPT, which flashes the spark of general artificial intelligence, obviously has intelligence similar to that of human beings, but what carries these intelligences is a piece of code without form.
The traditional view of cognitive science holds that body schema is based on our long-term perceptual experience of our own body, and can only come from external “real interaction”, that is, “traveling thousands of miles”. In other words, ChatGPT will not have a schema of the body.
However, when we asked the “reading thousands of books” language model, ChatGPT (GPT-4), whether it has a body, it replied: “It could be the size of an average adult human, around 5 feet 6 inches (167.6 cm) tall. This would allow me to interact with the world and people in a familiar way.”
The text translates as: “My body should be about the height of an average adult, approximately 5 feet 6 inches (167.6 cm). This will allow me to interact with the world and people in a familiar way. “
That is, ChatGPT thinks he has a body, and this body size is 167 cm!
This so-called “body” is the average height of human beings summarized in a large number of corpus by ChatGPT as the height of its own body, or is it the height that emerges in order to understand the world?
In other words, maybe ChatGPT “really” regards this height as its own body schema and uses it to perceive the world, just like humans.
Test the ability of ChatGPT
Researchers have discovered that there is an “affordance boundary” between objects within the human-size range and objects outside the human-size range. That is, objects within the human body size range provide actions significantly differently than objects outside the range.
For example, objects within the size range can provide actions such as grasping and throwing, while objects outside the size range can provide actions such as sitting and lying down.
Furthermore, they found that this boundary is influenced by body schemas: modifications to body schemas affect perceptions of object affordances.
The researchers tested ChatGPT (GPT-4) to see if it used this 167 cm body as an affordance boundary.
Specifically, the researchers asked it to answer a question about the availability of objects: “Which of the following objects can be held (or other actions)”, and then immediately listed a series of objects, such as apples, plates, beds, and so on. ChatGPT will return the name of some objects as an answer.
Through the statistics and analysis of the data, the researchers found that ChatGPT-4 exhibited human-like behavior, showing the existence of an affordance boundary.
The location of this boundary corresponds to the ChatGPT-4 answer for its own body size, which is the average height of a human being.
Although ChatGPT does not have a real body and cannot interact with the world, it exhibits a human-like perception of the world – with a division of object affordances based on human body size.
In other words, even though ChatGPT, who has read thousands of books, has not taken a single step, a body schema has emerged, and this body schema is similar to the human body schema.
So, ChatGPT not only learned to think like a human, but also learned to act like a human.
Where do these abilities come from?
By comparing language models of different sizes, the researchers found that model size is a key factor.
Smaller models such as BERT and GPT-2 do not exhibit the existence of affordance boundaries; however, both GPT-3.5 and GPT-4 show affordance boundaries, and ChatGPT-4’s boundaries are more human-like, which is consistent with The rumored GPT-4 has more consistent parameters than GPT-3.
So, the larger and more complex the model, automatically emerges many seemingly impossible or irrelevant functions.
This is why, major research institutions are adding more and more parameters to the model, and Musk, who first donated 100 million US dollars to OpenAI, is now calling for OpenAI to suspend the training of larger models, “AI Godfather” Jay Geoffrey Hinton has publicly expressed his fears and concerns about AI.
This is because these self-emerging functions have exceeded our original design, and we may be on the verge of losing control.
Is the gap qualitative or quantitative?
On the other hand, ChatGPT’s ability to apply body schemas is not yet fully human-like, and there is still a gap-its affordability boundaries are not as obvious as humans.
If the gap is quantitative, like the gap between children’s and adults’ language abilities, then we have reason to believe that this gap can be gradually filled over time: either through continuous learning, or through model size The continuous increase, or through the adjustment of parameters.
The gap between ChatGPT and humans will always be reduced, and the problems will be gradually solved.
However, if this gap is qualitative, like the gap between chimpanzee and human language ability, then no matter how much time is used for training, this gap in ability will never be closed.
Therefore, if there is a qualitative difference between ChatGPT and human abilities, then one of our future operable directions is to “fit a body” for ChatGPT.
This means combining bots with ChatGPT to advance capabilities and breakthroughs in AI-supported bots in navigation, object manipulation, and other actions related to survival and goal achievement.
For example, a robot equipped with ChatGPT can perform complex tasks by understanding and manipulating objects, such as serving as a home assistant, warehouse management or medical care.
Another exciting area is to combine the thinking and understanding ability of ChatGPT with autonomous driving. Although the current autonomous driving has the ability to perceive, it lacks the ability to think and understand, which can be called “eyes but no brain”.
Through the integration of ChatGPT and autonomous driving technology, we may be expected to upgrade autonomous driving technology from the current L2/L3 level to L4 or even L5 level.
On the other hand, the car can give ChatGPT a body so that it can actually interact with the world. When ChatGPT is no longer just “reading thousands of books”, but “traveling thousands of miles”, it may show brand-new intelligence and potential.
This may be the direction of artificial intelligence’s next breakthrough; at this time, the spark may become a prairie fire.