Advanced Computing in the Age of AI | Monday, June 24, 2024

Grace Hopper Gets Busy with Science 

Nvidia’s new Grace Hopper Superchip (GH200) processor has landed in nine new worldwide systems. The GH200 is a recently announced chip from Nvidia that eliminates the PCI bus from the CPU/GPU communications pathway. 

As announced by Nvidia at ISC 2024, New Grace Hopper-based supercomputers coming online include EXA1-HE, in France, from CEA and Eviden; Helios at Academic Computer Centre Cyfronet, in Poland; and Alps at the Swiss National Supercomputing Centre from Hewlett-Packard Enterprise (HPE); JUPITER at the Jülich Supercomputing Centre in Germany; DeltaAI at the National Center for Supercomputing Applications at the University of Illinois Urbana-Champaign; and Miyabi at Japan’s Joint Center for Advanced High Performance Computing — established between the Center for Computational Sciences at the University of Tsukuba and the Information Technology Center at the University of Tokyo.

Recently deployed Grace Hopper Systems (Source: Nvidia)

 

 

CEA, the French Alternative Energies and Atomic Energy Commission, and Eviden, an Atos Group company, in April announced the delivery of the EXA1-HE supercomputer based on Eviden’s BullSequana XH3000 technology. The BullSequana XH3000 architecture offers a new, patented warm-water cooling system, while the EXA1-HE is equipped with 477 compute nodes based on Grace Hopper.

“AI is accelerating research into climate change, speeding drug discovery, and leading to breakthroughs in dozens of other fields,” said Ian Buck, vice president of hyperscale and HPC at Nvidia. “Nvidia Grace Hopper-powered systems are becoming an essential part of HPC for their ability to transform industries while driving better energy efficiency.”

In addition, Isambard-AI and Isambard 3 from the University of Bristol in the U.K. and systems at the Los Alamos National Laboratory and the Texas Advanced Computing Center in the U.S. join a growing wave of Nvidia Arm-based supercomputers using Grace CPU and the Grace Hopper platform.

Eliminating the PCI Middleman

The Grace Hopper design combines an Arm-based Grace CPU with a Hopper GPU. Before Grace Hopper, CPUs (usually X86) used one or more PCI-bus-based GPUs. These additional GPUs must communicate over the PCI bus and, therefore, create two or more distinct memory domains: the CPU domain and the GPU domain. Data transfer between these domains must travel across the PCI bus, which often becomes a bottleneck.

Grace Hopper has connected the CPU and GPU using the NVLink-C2C interconnect providing a single shared memory domain. That is a memory-coherent, high-bandwidth, and low-latency interconnect. It is the heart of the Grace Hopper processor and delivers up to 900 GB/s total bandwidth.

Figure 2 shows a 3X performance gain for a coupled ocean/atmospheric model for Grace-Hopper over the traditional PCI bus-based CPU/GPU design.

Figure2: Coupled Ocean Model. Source: Nvidia

Sovereign AI and HPC

The drive to construct new, more efficient AI-based supercomputers is accelerating as countries worldwide recognize sovereign AI’s strategic and cultural importance — investing in domestically owned and hosted data, infrastructure, and workforces to foster innovation.

Bringing together the Arm-based Nvidia Grace CPU and Hopper GPU architectures, the GH200 is a new optimized design for scientific supercomputing centers worldwide. Many centers plan to go from system installation to real science in months instead of years.

As an example, the Isambard-AI phase one consists of an HPE Cray Supercomputing EX2500 with 168 Nvidia GH200 Superchips, making it one of the most efficient supercomputers ever built. When the remaining 5,280 Nvidia Grace Hopper Superchips arrive at the University of Bristol’s National Composites Centre this summer, performance will increase by a factor of thirty-two.

“Isambard-AI positions the U.K. as a global leader in AI and will help foster open science innovation domestically and internationally,” said Prof. Simon McIntosh-Smith, University of Bristol. “Working with Nvidia, we delivered phase one of the project in record time, and when completed this summer, we will see a massive jump in performance to advance data analytics, drug discovery, climate research, and many more areas.”

EnterpriseAI