Advanced Computing in the Age of AI | Friday, April 26, 2024

Nvidia’s Turing Architecture for Real-Time Ray Tracing 

Source:Nvidia

Nvidia CEO Jensen Huang this week unveiled Turing, the company’s next-gen GPU platform that introduces new RT Cores to accelerate ray tracing and new Tensor Cores for AI inferencing.

Announced at the SIGGRAPH graphics conference in Vancouver, Nvidia considers Turing “the greatest leap since the invention of the CUDA GPU in 2006,” noting “Turing fuses real-time ray tracing, AI, simulation and rasterization to fundamentally change computer graphics.”
Turing, combined with CUDA 10, FleX and PhysX SDKs, will allow developers “to create complex simulations, such as particles or fluid dynamic for scientific visualization, virtual environments and special effects,” the company said.

A new line of professional GPUs, Quadro RTX (“the world’s first ray-tracing GPU,” according to Nvidia press material), are the first to adopt the new architecture. The RTX line launches with Nvidia Quadro RTX 8000, the Quadro RTX 6000 and the Quadro RTX 5000, all due to arrive by year’s end. Nvidia also announced Quadro RTX Server, a reference architecture for the visual effects industry that combines Quadro RTX GPUs with Quadro Infinity software (expected out in the first quarter of 2019).
Nvidia refers to Turing as its eighth-generation GPU architecture, although counting back — Volta, Pascal, Maxwell, Kepler, Fermi, Tesla — Turing makes seven. Eight if you count the pre-CUDA GeForce series as one architecture. There’s also Denver, but that’s a CPU (Arm) microarchitecture.

Turing introduces a new Streaming Multiprocessor architecture, features up to 4,608 CUDA cores, and delivers up to 16 trillion floating point operations in parallel with 16 trillion integer operations per second. The chip encompasses 18.6 billion transistors on a 754 mm2 die. So a little smaller than Volta, still the reigning giant with 21 billion transistors occupying a 815 mm2 die size. Pascal, in comparison, has 11.8 billion transistors on a 471 mm2 die. A new unified cache on Turing-based GPUs offers double the bandwidth of the previous generation.
Dedicated ray-tracing processors on Turing, called RT Cores, enable light and sound transmissions in 3D environments to be computed at up to 10 GigaRays (10 million light rays) a second. This translates into a 25x speed advantage for Turing over Pascal for real-time ray tracing, according to Nvidia.

The architecture draws from Volta in that it features Tensor Cores for the acceleration of deep learning training and inferencing, but the upgraded Turing Tensor Cores deliver up to 500 teraops, according to Nvidia. The Tensor Cores can be used in AI-enhanced rendering tasks, including deep learning anti-aliasing (DLAA), denoising, resolution scaling and video re-timing.

Nvidia's mission, of course, is to add Turing to a product portfolio that is firing on all cylinders. Late yesterday, the company reported revenue for the second quarter ended July 29 of $3.12 billion, up 40 percent from $2.23 billion a year earlier, and down 3 percent from $3.21 billion in the previous quarter. GAAP earnings per diluted share for the quarter were $1.76, up 91 percent from $0.92 a year ago and down 11 percent from $1.98 in the previous quarter. Non-GAAP earnings per diluted share were $1.94, up 92 percent from $1.01 a year earlier and down 5 percent from $2.05 in the previous quarter.

“Growth across every platform – AI, Gaming, Professional Visualization, self-driving cars – drove another great quarter,” said Huang. “Fueling our growth is the widening gap between demand for computing across every industry and the limits reached by traditional computing. Developers are jumping on the GPU-accelerated computing model that we pioneered for the boost they need."

Also at SIGGRAPH, Nvidia debuted a refreshed RTX development platform, enhanced with new AI, ray-tracing and simulation SDKs.

The Turing Quadro products feature up to 48GB of GDDR6 memory (compare the Quadro Volta GV100 with 32GB of HBM2). As with the GV100, memory capacity can be doubled by linking two GPUs with NVLink technology.

Dell EMC, Hewlett Packard Enterprise, Lenovo, Fujitsu, Boxx and SuperMicro are among the system makers that have announced plans to support the latest line of Quadro processors.

Quadro RTX 8000 (source: Nvidia’s website)

This article originally appeared in sister publication HPCWire.

About the author: Tiffany Trader

With over a decade’s experience covering the HPC space, Tiffany Trader is one of the preeminent voices reporting on advanced scale computing today.

EnterpriseAI