Advanced Computing in the Age of AI | Wednesday, April 24, 2024

Sony’s ‘Sophy’ AI, Trained on Thousands of Races, Bests Humans Champs 

AI is a critical element of many modern video games, enabling non-player allies and opponents to intelligently move and act in response to player actions. But as games have become more complex—incorporating increasingly true-to-life physics, intricate game mechanics, and high player expectations of non-player intelligence—AI has struggled to keep pace with those more authentic worlds and behaviors. This problem is particularly acute for games like the Gran Turismo series, a racing game series that bills itself as a “realistic driving simulator” and which is the best-selling PlayStation-branded game franchise. Now, Sony and Polyphony Digital (developer of the Gran Turismo series) have revealed “Sophy,” a sophisticated AI trained with deep reinforcement learning that, after two years of practice, is able to best the world’s most skilled human Gran Turismo drivers.

What’s So Tough About Video Game Racing?

Gran Turismo poses particular difficulty for AI agents relative to something like, say, Mario Kart. In Gran Turismo, the precise physics of real-world driving are meticulously reproduced, from basic object interactions to road conditions to car and tire types. This creates problems for Gran Turismo AI that closely mirror the problems one would face if designing an AI to race in the real world.

“Racing is essentially trying to drive cars at the edge of control or just beyond,” explained Michael Spranger, COO of Sony AI, which is a wholly owned subsidiary of Sony. “Estimating braking points, finding the best line, searching for grip on the track to maximize speed and control are by itself very interesting machine learning problems, but racing means that you are not alone on the track. Other drivers impact the dynamics of the car through drafting effects that can help reach speeds higher than the actual top-speed of the car. But they can also drastically increase the braking distance due to reduced down force.”

Then, he said, there are the tactical problems: finding lines on the track to pass opponents, taking into account their opponents’ likely reactions. Racers also need to adhere to rules vis-a-vis skidding off the track and collisions (for which individual players might receive time penalties if they are found to have been at fault). Spranger said these are “imprecisely defined … [and] difficult to encode[.]” Finally, he said, there is a concept of fair play in racing—collisions shouldn’t be used as a tool to win races, but that principle has to be weighed against a level of aggression that is necessary to win races at all. “Finding the right balance is a real challenge,” he said.

And the AI has to make those constant, real-time decisions for the entire duration of a race.

Developing Sophy

The project began in April 2020, when Sony AI was established with the aim “to accelerate the fundamental research and development of AI and enhance human imagination and creativity, particularly in the realm of entertainment.” From the outset, Sony AI worked with Polyphony Digital in its development of the AI driver.

“We trained GT Sophy using a new deep RL algorithm we call quantile regression soft actor-critic (QR-SAC),” the Sony AI researchers explained in a paper, which made the cover of Nature. “This approach learns a policy (actor) that selects an action on the basis of the agent’s observations and a value function (critic) that estimates the future rewards of each possible action. … The agent was given a progress reward for the speed with which it advanced around the track and penalties if it went out of bounds, hit a wall or lost traction. These shaping rewards allowed the agent to quickly receive positive feedback for staying on the track and driving fast.”

Image courtesy of Sony.

But Sophy didn’t start out as a good driver—in fact, Sony said that at first, the AI could barely keep straight on the track. Over time, though, the AI was able to learn which combinations of track curvature, speed, wheel rotation, and other variables led to better outcomes. “Notably, GT Sophy learned to get around the track in only a few hours and learned to be faster than 95% of the humans in our reference dataset,” the researchers wrote.

This process was iterated across many thousands of simulations. These simulations were run on Sony-owned hardware that is typically used for cloud gaming (wherein users play games by accessing an extremely low-latency datacenter, rather than running the games on local hardware).

“Although GT ran only in real time, each GT Sophy instance controlled up to 20 cars on its PlayStation, which accelerated data collection,” the researchers wrote. “We typically trained GT Sophy from scratch using 10-20 PlayStations, an equal number of compute instances and a GPU machine that asynchronously updates the neural networks.”

The training process ran into a number of hiccups due to the complexity of the task. “The progress reward alone was not enough to incentivize the agent to win the race,” the paper explained, by way of example. “If the opponent was fast enough, the agent would learn to follow it and accumulate large rewards without risking potentially catastrophic collisions.” To circumvent this behavior, they had to add specific rewards for passing drivers and gaining distance ahead of them.

The researchers also had to introduce unpredictable drivers—Sophy couldn’t just race against Sophy, or it wouldn’t be ready for unpredictable human opponents. “For example, as a human enters a difficult corner, they may brake a fraction of a second earlier than the agent would,” the researchers wrote. “Even a small bump at the wrong moment can cause an opponent to lose control of their car. By racing against only copies of itself, the agent was ill-prepared for the imprecision it would see with human opponents. … This feature of racing—that one player’s suboptimal choice causes the other player to be penalized—is not a feature of zero-sum games such as Go and chess.”

This problem was alleviated with a mixture of older AIs—including the AI shipped with the commercial version of the game—and less sophisticated AI agents from previous experiments.

Where the Rubber Meets the Road

Eventually, though, it was time for Sophy to face off against its intended quarry: humans. The first “Race Together” competition between Sophy and human drivers was held in July 2021, a little more than a year after Sophy’s development began. In solo exercises, it had shown exceptional lap times—and whenever one of the Sophy-powered cars managed to pull ahead of the pack in the team races, it would outshine the humans behind it. But when the race was tighter, the AI struggled to handle the complexities of the frequent interactions with other drivers, and overall, the four-driver human team (led by one of the best Gran Turismo drivers in the world, Takuma Miyazono) beat the Sophy team 86 points to 70.

In the wake of the first competition, the researchers improved the training regime, increasing the network size and modifying features and rewards.

A few months later, in October, there was a rematch. This time, the results were quite different—the learning had paid off, and Sophy wiped the floor with the opposition, achieving literally double the points (52 for the humans, 104 for the AI). The Polyphony team noted that the AI had achieved a surprising feat in one race: wiping out, then recovering to come in first—a drastic change from the races in July.

Of course, there are still challenges left to face for Sophy. “Although GT Sophy demonstrated enough tactical skill to beat expert humans in head-to-head racing, there are many areas for improvement, particularly in strategic decision-making,” the researchers noted. “For example, GT Sophy takes the first opportunity to pass on a straightaway, sometimes leaving enough room on the same stretch of track for the opponent to use the slipstream to pass back. GT Sophy also aggressively tries to pass an opponent with a looming penalty, whereas a strategic human driver may wait and make the easy pass when the opponent is forced to slow down.”

What’s the Point, Anyway?

Polyphony was careful to stress that Sophy is not intended to replace human players (what would be the point of a video game playing itself?), but instead to guide human players and generally improve player-versus-non-player racing. “The goal with Gran Turismo Sophy is ultimately to entertain people,” said Kazunori Yamauchi, president of Polyphony Digital.

“We envision a future where AI agents could introduce developers and creators to new levels of innovation and unlock doors to unimaged opportunities,” added Ueli Gallizzi, SVP of the Future Technology Group at Sony Interactive Entertainment. “We could see new levels of user engagement, better gaming experiences, and the entry of a whole new generation into the world of gaming.” To that point, Spranger said that one of the human drivers—Emily Jones, a world finalist in the Gran Turismo Championships—had talked about how watching Sophy drive had inspired her to try new strategies on the track that she hadn’t considered beforehand.

Now, Sony says that it has “additional flagship projects” in the works, covering gastronomy, imaging and sensing, and AI ethics.

EnterpriseAI