Advanced Computing in the Age of AI | Tuesday, October 3, 2023

Cray on Flash Tiered Storage for Supercomputing and Big Data 
sponsored content by Cray

There’s been a lot of recent talk about SSDs and flash storage, and rightly so. The move to flash storage is happening now—but a major shift, an inflection point, will occur as the price-performance of in-memory flash storage warrants replacing performance disk tiers, such as scratch file systems or SAN volumes derived from fast disks, like SAS. At Cray, we think of flash and disk storage systems and software as building blocks—and the associated system architectures as being key to how the entire, holistic system is delivered (hardware and software).

Cray is providing new ways of delivering high performance fast data to applications. One way to think of flash storage system architectures are as flash tiers—where data and IO needs to be dispatched to a compute system and application in the most efficient way possible.

Cray is delivering tiered storage solutions today—including SSDs, flash storage, disk, and tape. If you’re interested, please read the white paper on Cray’s Integrated Tiered Storage solution, written by Michael Feldman of Intersect360 Research.

A common use case for flash storage is burst buffers. But in some cases, the performance of flash storage will address a broader set of use cases and applications—to a range of data- and IO-hungry applications and workloads.   Today, a key issue with flash storage is both data delivery—and data movement across tiers. For applications to utilize flash storage, the data needs to be made available—through IO forwarding or other techniques.

In a talk at the Rice Oil & Gas Workshop, Dirk Smit, VP of Exploration Technology and Chief Scientist Geophysics, Innovation and R&D, at Shell, once used the analogy of data being like raisons in a compute cake. The data and IO simply needs to be there for applications, available and accessible when needed—without the latency of retrieval. Today IO is separated by networks—IP, fiber channel, InfiniBand, for example. Smit questions the viability of current network-attached models for storage IO. Will they be fast enough to handle the upstream exploration workloads of the future?

There are two essential value characteristics of SSDs:

  • Cost per IOPS (how much usable IO is achieved at a given price)
  • Cost per bandwidth (the actual bandwidth used by compute)

In some cases, flash tiers could provide 10x the bandwidth of traditional disk-based scratch tiers and maybe 2-4x the IOPS of disk. What’s important to remember is that these systems will not be measured purely on the speed of an individual device (hard disk, NAND, or SSD) but as an aggregate entity represented as a storage system, or flash tier.

The adoption of flash tiers will also be driven by efficiency. For example, in large-scale capability supercomputers, with stringent requirements for continuous availability and performance, increasing the machine efficiency can dramatically reduce the total cost of ownership. By reducing the amount of physical compute and storage disks required, the ROI can be millions, depending on the size of the machine.   Increasing the machine efficiency by 70-90% could cut the cost of the system by 1/3rd depending on the capabilities and configuration of the system.

One thing is certain, commercial and research customers require storage system architectures which build on leading commercial off the shelf technologies using world-renowned best practices and architectures. This is where Cray is having success today—delivering workflow-driven storage solutions, from High Performance Storage using flash storage and Cray Sonexion—and end-to-end tiered storage using Cray Tiered Adaptive Storage.

The future—built of flash tiers and other innovations—depend largely on how the entire system is put together. Or, as Seymour Cray once aptly put, “The future is seldom as same as the past.”