Advanced Computing in the Age of AI | Tuesday, September 26, 2023

Tesla To Spend $1B on AI Supercomputer, Not Falling for LLMs 

Tesla will spend $1 billion on its Dojo supercomputer through the next year as it beefs up video recognition capabilities for its autonomous cars.

"I think we will be spending something north of $1 billion over the next year on – through the next year, it’s well over $1 billion in Dojo," said Elon Musk, Tesla's CEO, during an earnings call.

The company also plans to deploy 300,000 Nvidia A100 GPUs by the end of 2024, which will supplement the Dojo supercomputer, which will deploy Tesla’s homegrown D1 chip.

"We may reach in-house neural net training capability of 100 exaflops by the end of next year," Musk added.

Tesla uses multiple types of hardware for AI inferencing and training. It has on-car hardware running software called Autopilot, which is for assisted driving, and a beta version of Full Self-Driving (FSD), which is for hands-free autonomous driving.

Tesla mines its fleet for visual data gathered by eight cameras and sensors, which it then feeds into its training model. The current data center where models are trained includes more than 14,000 GPUs.

"To date, over 300 million miles have been driven using FSD beta. That 300-million-mile number is going to seem small very quickly. It’ll soon be billions of miles, then tens of billions of miles," Musk said.

Tesla also started production of its Dojo supercomputer, which is a separate AI training cluster that has racks of homegrown D1 chips. The D1 delivers 22.6 teraflops of FP32 performance, has 50 billion transistors, 10TBps of on-chip bandwidth, and 4TBps of off-chip bandwidth.

Elon Musk. (COMEO/Shutterstock)

"That’s what Dojo is designed to do – optimize for video training. It’s not optimized for LLMs. With video training, you have a much higher ratio of compute-to-memory bandwidth, whereas LLMs tend to be memory bandwidth," Musk said.

Tesla is quickly gathering data from FSD software in its cars to create a system that will make autonomous driving safe and reduce on-road deaths.

"In order to build autonomy, we also need to train our neural net with data from millions of vehicles ... the more training data you have, the better the results," Musk said, adding "We see a clear path to full self-driving being 10 times safer than the average human driver"

But a lack of hardware is slowing down the video training. Musk specifically praised Nvidia CEO Jensen Huang, who was working to get Tesla more GPUs to help meet their computing need for computing speed.

"Frankly, I don’t know if they could deliver us enough GPUs. We might not need Dojo, but they can’t. So they’ve got so many customers. They have been kind enough to nonetheless prioritize some of our GPU orders," Musk said.

Musk is betting on heavy R&D and AI expenditure to give Tesla a competitive advantage over rival car makers, which are not putting as much effort into training autonomous systems from scratch. Tesla this week reported second-quarter revenue of $21.27 billion, growing by 46% compared to the same quarter the previous year.

The earnings were considered a disappointment as the operating income was 9.6%, a year-over-year decline from 14.6%. That was partially due to lower average selling prices of Tesla’s electric vehicles. Tesla offered deep discounts on the S, X and 3 models in the most recent operating quarter.