Advanced Computing in the Age of AI | Monday, June 24, 2024

Facebook Open Sources Caffe2; Nvidia, Intel Rush to Optimize 

From its F8 developer conference in San Jose, Calif., today, Facebook announced Caffe2, a new open-source, cross-platform framework for deep learning. Caffe2 is the successor to Caffe, the deep learning framework developed by Berkeley AI Research and community contributors. Caffe2’s GitHub page describes it as “an experimental refactoring of Caffe [that] allows a more flexible way to organize computation.”

The first production-ready release of Caffe2 is, according to Facebook, “a lightweight and modular deep learning framework emphasizing portability while maintaining scalability and performance.” The social media giant says it worked closely with NVIDIA, Qualcomm, Intel, Amazon, and Microsoft to optimize Caffe2 for cloud and mobile environments.

Caffe2 will ship with tutorials and examples that demonstrate how developers can scale their deep learning models across multiple GPUs on a single machine or across many machines with one or more GPUs. The framework adds deep learning smarts to mobile and low-power devices by enabling the programming of iPhones, Android systems and Raspberry Pi boards.

On the new Caffe2 website, Facebook reported that its developers and researchers use the framework internally to train large machine learning models and deliver “AI-powered experiences” in the company’s mobile apps. “Now, developers will have access to many of the same tools, allowing them to run large-scale distributed training scenarios and build machine learning applications for mobile,” said the company.

Soon after Facebook announced the new open source framework, Nvidia and Intel published blog posts showing some early performance numbers.

“Thanks to our joint engineering,” wrote Nvidia, “we’ve fine-tuned Caffe2 from the ground up to take full advantage of the NVIDIA GPU deep learning platform. Caffe2 uses the latest NVIDIA Deep Learning SDK libraries — cuDNN, cuBLAS and NCCL — to deliver high-performance, multi-GPU accelerated training and inference. As a result, users can focus on developing AI-powered applications, knowing that Caffe2 delivers the best performance on their NVIDIA GPU systems.”

Nvidia claims near-linear scaling of deep learning training with 57x throughput acceleration on eight networked Facebook Big Basin AI servers (employing a total of 64 Nvidia Tesla P100 GPUs).

Nvidia also reported that its DGX-1 supercomputer will offer Caffe2 within its software stack.

Over at the Intel blog, Andres Rodriguez and Niveditha Sundaram describe the company’s efforts to boost Caffe2 performance on Intel CPUs. The silicon vendor is collaborating with Facebook to incorporate Intel Math Kernel Library (MKL) functions into Caffe2 to boost inference performance on CPUs.

Intel shares the inference performance numbers on AlexNet using the Intel MKL library and the Eigen BLAS library for comparison, noting Caffe2 on CPUs offers competitive performance.

Intel also emphasizes the performance gains it expects for deep learning workloads run on its Skylake processors. First introduced in the Google cloud, the newest Xeon will become generally available later this year. Skylake incorporates the 512-bit wide Fused Multiply Add (FMA) instructions as part of the larger 512-bit wide vector engine (Intel AVX-512), which Intel says provides “a significant performance boost over the previous 256-bit wide AVX2 instructions in the Haswell/Broadwell processor for both training and inference workloads.” Intel adds, “the 512-bit wide FMA’s essential doubles the FLOPS that Skylake can deliver and significantly speeds up single precision matrix arithmetic used in convolutional and recurrent neural networks.”

About the author: Tiffany Trader

With over a decade’s experience covering the HPC space, Tiffany Trader is one of the preeminent voices reporting on advanced scale computing today.