Advanced Computing in the Age of AI | Saturday, April 20, 2024

NVIDIA Launches TensorRT 8 to Make Conversational AI Smarter from Cloud to Edge 

July 20, 2021 -- NVIDIA today launched TensorRT 8, the eighth generation of the company’s AI software, which slashes inference time in half for language queries -- enabling developers to build the world’s best-performing search engines, ad recommendations and chatbots and offer them from the cloud to the edge.

TensorRT 8’s optimizations deliver record-setting speed for language applications, running BERT-Large, one of the world’s most widely used transformer-based models, in 1.2 milliseconds. In the past, companies had to reduce their model size, which resulted in significantly less accurate results. Now, with TensorRT 8, companies can double or triple their model size to achieve dramatic improvements in accuracy.

“AI models are growing exponentially more complex, and worldwide demand is surging for real-time applications that use AI. That makes it imperative for enterprises to deploy state-of-the-art inferencing solutions,” said Greg Estes, vice president of developer programs at NVIDIA. “The latest version of TensorRT introduces new capabilities that enable companies to deliver conversational AI applications to their customers with a level of quality and responsiveness that was never before possible.”

In five years, more than 350,000 developers across 27,500 companies in wide-ranging areas, including healthcare, automotive, finance and retail, have downloaded TensorRT nearly 2.5 million times. TensorRT applications can be deployed in hyperscale data centers, embedded or automotive product platforms.

Latest Inference Innovations

In addition to transformer optimizations, TensorRT 8’s breakthroughs in AI inference are made possible through two other key features.

Sparsity is a new performance technique in NVIDIA Ampere architecture GPUs to increase efficiency, allowing developers to accelerate their neural networks by reducing computational operations.

Quantization aware training enables developers to use trained models to run inference in INT8 precision without losing accuracy. This significantly reduces compute and storage overhead for efficient inference on Tensor Cores.

Broad Industry Support

Industry leaders have embraced TensorRT for their deep learning inference applications in conversational AI and across a range of other fields.

GE Healthcare, a leading global medical technology, diagnostics and digital solutions innovator, is using TensorRT to help accelerate computer vision applications for ultrasounds, a critical tool for the early detection of diseases. This enables clinicians to deliver the highest quality of care through its intelligent healthcare solutions.

Availability

TensorRT 8 is now generally available and free of charge to members of the NVIDIA Developer program. The latest versions of plug-ins, parsers and samples are also available as open source from the TensorRT GitHub repository.

About NVIDIA

NVIDIA’s (NASDAQ: NVDA) invention of the GPU in 1999 sparked the growth of the PC gaming market and has redefined modern computer graphics, high performance computing and artificial intelligence. The company’s pioneering work in accelerated computing and AI is reshaping trillion-dollar industries, such as transportation, healthcare and manufacturing, and fueling the growth of many others. More information at https://nvidianews.nvidia.com/.


Source: NVIDIA

EnterpriseAI