How Tesla Uses and Improves Its AI for Autonomous Driving
Tesla's self-driving capabilities will improve significantly once the Dojo supercomputer is added to its high-performance computing infrastructure, the company said at last week's investor meeting.
Tesla cars running the FSD (Full Self Driving) software – which currently numbers about 400,000 customers – will be able to make more intelligent self-driving decisions with the hardware upgrades, which will improve the company's overall AI capabilities, said Ashok Elluswamy, director of Autopilot software at Tesla, during a presentation at last week’s investor day meeting (YouTube link).
The company currently has an AI system that in real-time gathers visual data from eight cameras in the car, and produces a 3D output that identifies the presence of obstacles, their motion, lanes, roads and traffic lights, and models a task that helps cars make decisions.
Tesla mines its network of cars for more visual data, which it feeds into a training model. The training model is continuously learning to solve newer problems, and feeding more helps the AI get a better understanding of patterns on the road. The new learnings are fed into cars through FSD software upgrades.
"If we rinse and repeat the process, it gets better and better," Elluswamy said. He later added "the solution to scalable FSD is getting the architecture, the data and the compute just right and we have assembled a world class team to execute on this. They are pushing the frontier on these three items," Elluswamy said.
It has not been smooth sailing for FSD, with software glitches forcing Tesla to recall a little over 360,000 vehicles. The company delivered software fixes via an over-the-air update.
FSD is available to Tesla customers starting at $99 a month. Some customers with older Tesla models also pay extra to install FSD computers.
Elluswamy claimed that Teslas with FSD were still five to six times safer than the U.S. national average.
"As we improve the safety, the reliability and the comfort of a system, they can then unlock driverless operations, which then makes the car be used way more than what is used for right now," Elluswamy said.
The company currently runs its AI system on 14,000 GPUs in its data center, and can tap into 30 petabytes of video cache, which is growing to 200 petabytes. About 4,000 GPUs are used for auto labeling and the remaining 10,000 are used for training.
"All of this is going to significantly increase once we bring Dojo, which is our training computer, on board into this," Elluswamy said.
The Dojo system is based on the homegrown D1 chip, which delivers 22.6 teraflops of FP32 performance. It has 50 billion transistors and 10TBps of on-chip bandwidth, and 4TBps off chip bandwidth.
A collection of the D1 chips will be installed in a high-density ExaPOD, which will deliver 1.1 exaflops of BFP16 and CFP8 performance. Tesla’s on-board FSB computers can deliver 150 teraflops of performance, and are used mainly for inferencing.
Tesla made the D1 chip because of weaknesses in scaling GPUs and CPUs, said Ganesh Venkataraman, a senior director of hardware at Tesla, during a presentation at the Hot Chips conference last year.
"We noticed many of these bottlenecks. First on the inference side, which is what led us to do FSD computers. And then we started noticing similar scale issues for training and that's simply how it began. And then knowing your workloads ... we can optimize our systems catered towards our outputs," Venkataraman said.
In its early days, Tesla’s AI system relied on single camera and single frames, which were then stitched in post-processing for the autonomous car planning system.
"This was very brittle and was not leading to great success," Elluswamy said.
Over the last few years, Tesla has transitioned into a "multi camera video universe." Each car has eight cameras, which feeds visual information into the AI system, which then generates a single 3D output space. The AI makes decisions on the presence of obstacles, their motion, lanes, roads and traffic lights, among other things.
The task modeling goes beyond computer vision, and uses techniques that are also used in large-language based AI systems like ChatGPT, which includes transformers, attention modules, and auto-regressive modeling of tokens.
"With such an end-to-end system of solving perception, we have really removed the brittle post-processing steps and produced high-quality output for the planning system. Even the planning system is not stuck in the old ways. It is now starting to use more and more AI systems to solve this problem," Elluswamy said.
Autonomous cars need a rapid response to make smooth and safe decisions in real time. Elluswamy gave the example of a 50-millisecond response time in which an autonomous car can make a driving decision after interacting with the environment around the car, including pedestrians, traffic lights.
That’s a lot of data, and in conventional computing, "each configuration would take 10 milliseconds of compute and are easily 1000s of configurations to reason about. This would not be feasible." Elluswamy said, adding, "but using AI we have packaged all of this into a 50-millisecond compute budget so it can run real time."
Tesla is augmenting its raw data by collecting data from its cars on varying road conditions and traffic trends in different parts of the world. Tesla uses algorithms to reconstruct lanes, road boundaries, curbs, crosswalks and other images, which is then used as a basis to help cars navigate.
"It's happening by collecting various clips from different cars in the fleet and assembling all of them together into a single unified representation of the world around the car," Elluswamy said.
The training model is continuously reconstructed as more data is fed into the system. To train the networks, Tesla has built a sophisticated auto-labeling pipeline on the data collected, on which it runs computational algorithms and then produces the labels to train these networks.
"Once we have this base reconstruction we can build various simulations on top of it to produce an infinite variety of data to train,” Elluswamy said.
Tesla has capable simulators that can synthesize adversarial weather, lighting conditions, and even the motion of other objects, which may be rare in the real world.
Elluswamy highlighted how maturing Tesla's AI system solved a problem of false braking in its early days. The early system reasoned that a parked car was going to move into the path of an autonomous vehicle and preemptively braked as a safety mechanism.
Tesla solved that problem by mining its fleets for similar cases when the car falsely braked due to a parked car. It added 14,000 videos to its training set, and retrained the network, and it reasoned that there was no need to brake as there was no driver in the car and must be parked.
"Every time we add data, the performance improves. And then you can do this for every kind of task that we have in our system," Elluswamy said.