Advanced Computing in the Age of AI | Tuesday, April 16, 2024

Nvidia Releases a Batch of Open Source Tools for AI 

Graphics processors increasingly used as hardware accelerators for deep learning applications are also being deployed with the Kubernetes cluster orchestrator as another way to accelerate the scaling of training and inference for deep learning models.

The two-front approach includes Nvidia’s (NASDAQ: NVDA) release to developers this week of a Kubernetes on GPU capability aimed at enterprises training models on multi-cloud GPU clusters. Previously, Google (NASDAQ: GOOGL) launched a beta version of GPUs on its Kubernetes Engine aimed at accelerating machine learning and image processing workloads.

Nvidia’s Kubernetes initiative was among a package of open source and product releases announced by the chip maker during this week’s Computer Vision and Pattern Recognition conference. Also unveiled were a new version of its TensorRT, an inference optimizer and runtime engine; an open-source PyTorch extension called Apex; and a GPU data augmentation and image loading library. The library, dubbed DALI, is intended to optimize data pipelines associated with deep learning frameworks.

Nvidia’s push into cluster orchestration is designed to make Kubernetes, the de facto industry standards for managing the flow of application containers, more “GPU-aware,” explained Kari Briski, Nvidia’s director of accelerated computing software and AI products.

In particular, the company in aiming its “candidate” Kubernetes on GPUs at developers increasingly focused on AI applications, Briski said. The platform would assist developers and DevOps engineers in orchestrating resources on GPU clusters scattered across multiple clouds while boosting GPU cluster utilization on a service maintained by Nvidia.

The GPU maker’s Kubernetes push dovetails with a similar announcement from Google allowing cloud users to run GPUs on its Kubernetes Engine. The search giant said the beta release of its cloud GPU capability could be used to create node pools equipped with Nvidia’s Tesla V100, P100 and K80 processors. Google said the GPU service would be available in specific regions and zones.

The latest version of Nvidia’s TensorRT inference accelerator targets deep learning developers deploying inference models. Integrated into the TensorFlow machine learning framework, the tool aims to boost application inference performance with the inclusion of new layers and features. Specific inference applications include recommendation systems, neural machine translation, image classification and speech recognition.

GPU horsepower also is being used to accelerate computer vision applications via DALI, which along with a GPU-accelerated library for JPEG coding addresses performance bottlenecks in current machine vision deep learning applications. The tool is specifically designed to scale training of image classification models such as PyTorch, ResNet-50 and TensorFlow.

The service is being offered across Amazon Web Services’ (NASDAQ: AMZN) P3 8-GPU instances or Nvidia’s DGX-1 deep learning systems running its Volta GPUs. Meanwhile, the open source library released this week offloads augmentation steps to GPUs, the company said.

Briski noted that the deep learning tools released this week to open source developers are all used internally by Nvidia.

About the author: George Leopold

George Leopold has written about science and technology for more than 30 years, focusing on electronics and aerospace technology. He previously served as executive editor of Electronic Engineering Times. Leopold is the author of "Calculated Risk: The Supersonic Life and Times of Gus Grissom" (Purdue University Press, 2016).