Latest MLPerf Results See Intel, Graphcore, Google Increase Their Presence Behind Nvidia
While Nvidia again dominated the latest round of MLPerf training benchmark results, the range of participants has expanded. Yes, Google’s forthcoming TPU v4 performed well, but competitors also shined. Intel had broad participation, featuring systems with 3rd Gen Xeons and with its Habana Labs Gaudi chips, while Graphcore had submissions based on its IPU chips. There was an entry from China – Peng Cheng Laboratory – using Arm CPUs and sixty-four AI processors (Huawei Ascend 910).
In this latest round (MLPerf Training v1.0), MLPerf received submissions from 13 organizations and released over 650 peer-reviewed results for machine learning systems spanning from edge devices to data center servers. The results were released in concert with ISC21, the ISC (International Supercomputing) High Performance conference, which is traditionally held in Europe as an international event and exhibition for high performance computing (HPC), networking and storage. This year's event, which continues through July 2, is being held virtually due to the COVID-19 pandemic.
“Submissions this round included software and hardware innovations from Dell, Fujitsu, Gigabyte, Google, Graphcore, Habana Labs, Inspur, Intel, Lenovo, Nettrix, NVIDIA, PCL & PKU, and Supermicro,” reported MLCommons, the open engineering consortium that works to accelerate machine learning innovation.
MLCommons is the entity running MLPerf tests, which began in 2018. MLCommons remained somewhat in the background until last December when it reorganized the team and announced that it had launched an industry-academic partnership to also broaden access to critical ML technology for the public good. The non-profit organization, which initially was formed as MLPerf, now boasts a founding board that includes representatives from Alibaba, Facebook AI, Google, Intel, NVIDIA , Professor Vijay Janapa Reddi of Harvard University and a broad range of more than 50 founding members. The founding membership includes over 15 startups and small companies that focus on semiconductors, systems, and software from across the globe, as well as researchers from universities like U.C. Berkeley, Stanford, and the University of Toronto.
The idea for the work is to provide a set of machine learning benchmarks for training and inferencing that can be used to compare diverse systems. Initially, systems using Nvidia GPUs comprised the bulk of participants and to some extent that made MLPerf seem to be largely a Nvidia-driven showcase. Recent broader participation by Google, Graphcore, Intel and its subsidiary Habana is a positive step with the hope that more of the new AI accelerators and associated systems will start participating.
MLCommons reported that compared to the last submission round, the best benchmark results improved by up to 2.1x, showing substantial improvement in hardware, software, and system scale. The latest training round included two new benchmarks to measure performance for speech-to-text and 3D medical imaging:
- Speech-to-Text with RNN-T. RNN-T: Recurrent Neural Network Transducer is an automatic speech recognition (ASR) model that is trained on a subset of LibriSpeech. Given a sequence of speech input, it predicts the corresponding text. RNN-T is MLCommons’ reference model and commonly used in production for speech-to-text systems.
- 3D Medical Imaging with 3D U-Net. The 3D U-Net architecture is trained on the KiTS 19 dataset to find and segment cancerous cells in the kidneys. The model identifies whether each voxel within a CT scan belongs to a healthy tissue or a tumor, and is representative of many medical imaging tasks.
Similar to past MLPerf Training results, the submissions consist of two divisions: closed and open. Closed submissions use the same reference model to ensure a level playing field across systems, while participants in the open division are permitted to submit a variety of models. Submissions are additionally classified by availability within each division, including systems commercially available, in preview, and R&D.
“We’re thrilled to see the continued growth and enthusiasm from the MLPerf community, especially as we’re able to measure significant improvement across the industry with the MLPerf Training benchmark suite,” said Victor Bittorf, co-chair of the MLPerf Training working group.
The bulk of systems entered still relied on a variety of Nvidia GPUs and the company was quick to claim victory in a pre-briefing with media and analysts.
“Only Nvidia, submitted across all eight benchmarks and in the commercially available category,” said Paresh Kharya, senior director of product management, data center computing. “Nvidia’s Selene supercomputer, which is a DGX SuperPod, set all eight performance records. We completed four out of eight tests in less than a minute, the most complex test, which was the MiniGo benchmark took less than 16 minutes. Google submitted TPU v4 as a preview submission, a category that implies that it’s not yet commercially available for customers to use. And unlike the commercially available category, preview does not require software used for submissions to be available either. Comparing Google’s preview submission to Nvidia’s commercially available submission on the relatively harder tasks that took more than a minute, our DGX SuperPod outperformed [Google’s TPU v4] on 3D-Unit and Mask R-CNN.”
To read the rest of this story, please go to our sister website HPCwire.