Advanced Computing in the Age of AI | Wednesday, July 24, 2024

Cray Debuts ClusterStor E1000 for Converged AI-HPC Workloads 

Cray, now owned by HPE, today introduced the ClusterStor E1000 storage platform, which leverages Cray software and mixes hard disk drives (HDD) and flash memory (SSD) to accommodate converged HPC-AI workloads at many scales. Cray described the new system as the final step in re-architecting its product portfolio for the exascale era. Cray also didn’t miss the opportunity to tout selection of ClusterStor E1000 for use in the first three planned U.S. exascale computers (Aurora, Frontier, and El Capitan with Cray as the prime systems contractor for the latter two).

For a company whose long-term health was sometimes questioned in recent years, Cray is riding high with prior worries about its standalone viability decisively resolved by the HPE acquisition. In a press pre-briefing, Uli Plechschmidt, director of storage product marketing, said the ClusterStor E1000 completes Cray’s product refresh – joining Cray’s Shasta architecture, Slingshot interconnect technology, and updated Cray software. He also reinforced Cray’s intent to serve a larger market.

“The exascale era is not defined by the size of the systems. It’s defined by the convergence of workloads – not just classic modeling and simulation running on the supercomputers or HPC clusters, but also methods of artificial intelligence like machine learning or big data analytics, all running on one machine. Going forward we’re going to focus on artificial intelligence joining the modeling and simulation because it’s the most disruptive workload from an I/O pattern perspective,” he said.

Cray expects new dense media packaging, a newly-designed memory controller, and new software (ClusterStor Data Services) – all tightly integrated with its system architecture – to help tame the data-flow challenge. Interestingly, the new controller uses AMD’s newest Rome generation CPU.

“To handle the massive growth in data that corporations worldwide are dealing with in their digital transformations, a completely new approach to storage is required,” said Peter Ungaro, SVP & GM, HPC & AI of Cray, a Hewlett Packard Enterprise company. “Cray’s new storage platform is a comprehensive rethinking of what high performance storage means for the Exascale Era. The intelligent software and hardware design of ClusterStor E1000 orchestrates the data flow with the workflow – that’s something no other solution on the market can do.”

The ClusterStor E1000 is the latest addition to the ClusterStor product line, which Cray purchased from Seagate in 2017. Claiming the new system is the fastest in the world on several metrics, Plechschmidt said new dense storage media packaging could deliver up to 1.6TB/s and 50 million IOPS per solid state drive rack and up to 120 GB/s and 10 PB usable per hard disk drive base rack.

Also noteworthy, NERSC (National Energy Research Scientific Computing Center) will deploy the new ClusterStor E1000 on Perlmutter as its fast all flash storage tier, which will be capable of over four terabytes per second write bandwidth. “This architecture will support our diverse workloads and research disciplines,” said NERSC Director Sudip Dosanjh. “Because this file system will be the first all-NVMe file system deployed at a scale of 30 petabytes usable capacity, extensive quantitative analysis was undertaken by NERSC to determine the optimal architecture to support the workflows our researchers and scientists use across biology, environment, chemistry, nuclear physics, fusion energy, plasma physics and computing research.”

The rise of mixed workloads to support converged HPC/AI workloads is especially challenging for the storage system. As a general rule the sequential movement of large files typical of traditional modeling and simulation is reasonably well-served by less expensive hard disk drives. Conversely, the I/O management for AI (particularly training) often requires moving large numbers of small files, accessed at random, which is best handled by more expensive flash memory.

“IDC projects that even in the year 2023 there still will be an 8X price difference per gigabyte between hard disk drives and enterprise SSD. [Although] we actually think the SSD price performance improvement will go a little bit faster,” said Plechschmidt. Balancing cost-performance issue is a moving target with data management software as important as the hardware media.

Broadly, ClusterStor E1000 is a “factory engineered solution” in which users mix and match media to accommodate the specific I/O profiles of the workloads in the workflow. It can be configured three ways....

This article originally appeared in sister publication HPCWire.