An AI Playing Mario Proves Future Machines Will Have Human-Like Curiosity

There’s something sweet and quaint about watching a new player try out the Classic Mario game for the first time, like in this video. The 2D, pixelated form of tiny plumber Mario appears onscreen at the beginning of stage 1. The player sends a hesitant signal, and Mario lurches forward. Then stops. Another signal and he turns back before stopping again.

A third signal, and Mario jumps into the air for the first time. In these few virtual movements, you can see an entire spectrum of infancy, where a baby tries out its limbs for the first time, driven by curiosity to find out how much it can do, how it can affect the environment around it.


Except that the game isn’t being played by a young child, but by an AI(Artifical Intelligence) program created by University of California’s Pulkit Agarwal and his team. What you were witnessing was a machine exhibiting curiosity while exploring a game world it had never encountered before.

Curious AI

Depending on how much science fiction you consume, you may believe computers that are capable of curiosity-driven learning spells certain doom for mankind. The truth, naturally, is far less dramatic. Trying to build human-like curiosity into an AI’s essential makeup is a process that is decades old. The very machines you use now employ a very basic reinforcement learning algorithm to perform their tasks, where the AI receives a ‘reward’ for performing a task correctly, and a penalty for undesired behavior.


It is to address the limitations of this approach that Agarwal and his team are working on their new project. The Mario-playing AI they created sees the world of the 2D video game in terms of a mathematical formula with a logical series of progressions. Using the formula, the AI predicts what the game will look like a few frames later based on the commands sent by the AI to the game in terms of pushing the forward button, back button, jump button, etc.

Based on how inaccurate the AI’s prediction of the game’s progress is, it receives a ‘reward point’. Basically, the AI gets rewarded for being ‘wrong’. This motivates the AI, once it has mastered a certain section of the game, to automatically move onto to other, undiscovered parts of the game in constant search of the next unexpected gameplay that can undermine the accuracy of its mathematical prediction model and result in more reward points.

Mimicking Humans

This is the basic method through which an infant begins to feel curiosity in the world around him, which is the behavior that the AI was intended to mimic. The result is a machine, that much like humans, is capable of exploring the world it exists in and draw conclusions on how best to navigate it to perform a task, without needing a programmer to issue instructions every step of the way.

The same strategy is hoped to be applied to the real world in the near future, where the AI used in self-driving cars and other robots is capable of observing the world as a mathematical formula where it can compute all the variables to independently decide on the best course of action to complete a task. Like choosing the best route home. Or avoiding collisions.


What’s slowing the process down right now is that while the math formula can be processed by the AI when applied to a simple, 2D game like Mario, the real world contains far too many variables, from a strong gust of wind to the state of the road and the presence of nearby humans, for the AI to process the whole data in time to reach a conclusion.

Real-World Solutions

One possible way to sidestep this issue is simply to limit the number of variables that an AI must take into account to perform a task. So a self-driving car will take into account the presence of other cars and the state of the road but will ignore what is happening off-road as irrelevant to its function.

Naturally, such a form of AI would have huge applications in the real-world and lead to the world’s first batch of truly independent machines capable of carrying out their tasks without human supervision until their battery runs out. But there are limitations as well. Imagine a search-and-rescue robot that is programmed to aid workers during a nuclear meltdown, but is also programmed to ignore variables that do not relate directly to the meltdown, such as structural collapse or an earthquake. Such a robot will be limited in terms of the aid it can provide to workers, and may even get in the way of human helpers managing the rescue effort.

All in all, Agarwal and team’s AI is a good portent of things to come, but it will still be a few years before we live in a world where humans only have to give out commands once to their robot underlings to carry out any task ranging from driving the car to cleaning out the house.