Advanced Computing in the Age of AI | Monday, July 22, 2024

Lustre Makes the Enterprise Grade with HSM 

The Lustre parallel file system that was spawned and nurtured in supercomputing labs is now more appropriate as a high-speed file system for enterprise applications.

With the launch of the Lustre 2.5, the open source file system is getting a set of APIs that will let hierarchical storage management (HSM) software plug into it, allowing files to be moved on and off Lustre to other disk or tape subsystems as needed.

This is a key requirement of enterprise customers, and as it turns out, it is one that is going to make quite a few supercomputer centers happy, too.

Galen Shipman, chairman of Open Scalable File Systems, the non-profit located in Beaverton, Oregon that coordinates the development efforts across the Lustre community, tells EnterpriseTech that Lustre has already done plenty of growing up. Lustre, Shipman says, should no longer be viewed as a high-speed scratch disk for supercomputing, but he concedes that the HSM features, which will allow enterprises to plug Lustre arrays into their existing disk and tape storage setups and automatically move files around those devices based on policies, are going to be a boon for enterprises.

Right now, if you use Lustre in the enterprise – perhaps in financial services to take in market data feeds and store it for processing, maybe by Hadoop MapReduce – you have to manually move data into and out of Lustre. Keeping track of data sets is a big pain, and enterprises like to automate the offloading of data to either slower disk or tape archives. Doing so saves money because slow disk and even slower tape is a lot less expensive than a Lustre file system for a unit of capacity.

"Lustre has come to a level of maturity where it has the reliability characteristics that we need for a long-term storage solution," says Shipman. "We spent a lot of time dealing with Lustre performance and scalability, but over the past four years, we have really focused in on the reliability of the system. And this has opened Lustre up to an entirely new domain of use cases beyond scratch space for HPC environments."

In his day job, Shipman is director of the compute and data environment for science at Oak Ridge National Laboratory. The price/performance benefits that the lab has seen as it has implemented Lustre are exactly what Shipman expects enterprise customers to be attracted to as they wrestle with large datasets.

Down at Oak Ridge, the Spallation Neutron Source is a $1.8 billion facility that is used for a variety of physics, chemistry, materials science, and biology experiments relating to neutron particles. Oak Ridge wants to store the data flowing off this apparatus for a decade or more, and until recently it was using scale-out NAS storage that at best could ingest data at under 1 GB/sec across all of the arrays. With the improvements to Lustre on the reliability front, the Lustre cluster that Oak Ridge now uses is more reliable than those NAS arrays, according to Shipman, costs the same amount of money, has twice the disk capacity, and can move data at 15 GB/sec.

"This is why organizations are looking at Lustre now," Shipman explains. "It has the resiliency and reliability characteristics, and because of the way it has been architected, you get the performance for free. In a traditional storage architecture, you cannot get the full bandwidth out of it – you are essentially using it for capacity. Lustre opens up that capacity for bandwidth as well."

The other thing that is making potential extreme-scale enterprise customers consider Lustre aside from the new HSM features and the price/performance advantages outlined above is the growing ecosystem of companies that either offer Lustre distributions with support, as Intel is doing after buying Whamcloud last year, or embedding it in their storage products, as Aeon Computing, Cray, DataDirect Networks, EMC, Fujitsu, NetApp, SGI, Terascala,  and Xyratex do.

One of the things that makes Lustre particularly useful for enterprises is the Distributed Namespace feature, which is abbreviated DNE for some reason. This feature was added in Lustre 2.4, which came out earlier this year, and will be further improved with Lustre 2.6, expected sometime in the second quarter of next year.

The way Lustre works, you have a cluster of servers with lots of disk drives, and a metadata server acts as a head node to keep track of where all the bits of data are stored on the parallel file system spanning those nodes. The metadata server keeps track of directories, permissions, and other aspects of files. If you have very large files, then this metadata can be fairly modest in size. But if you have a lot of smaller files running on the file system, and they are changing a lot, then the metadata server can choke.

A data pattern of small files that are changing rapidly is precisely what happens all the time in the financial services and life sciences sectors, says Shipman, just to name two examples. With the DNE feature, you can have multiple metadata servers and scale this control portion of the Lustre file system independently of the storage capacity scaling and not have it choke.

Shipman says that data replication is on the Lustre wish list for enterprise customers. This will allow for data to be moved between geographically dispersed datacenters. Shipman is not committing to when data replication will be added to the Lustre roadmap, but some initial funding has come into OpenSFS to scope out that work. With servers getting more powerful and able to push more and more data through faster networks, OpenSFS is also keeping an eye on the balance between the data rates that servers can push and the Lustre file system can ingest.

How far can you push Lustre? Shipman says that the current release can scale to tens of petabytes of capacity in a single namespace, and with distributed namespaces it can easily reach 100 PB or more and handle hundreds of thousands of metadata operations per second. As for bandwidth, Fujitsu with its K supercomputer and Oak Ridge with its Spider II have been able to push above 1 TB/sec of bandwidth into and out of the file system.

"These are levels of performance you cannot achieve with any other file system that I am aware of," says Shipment.

And that is why extreme-scale enterprises are going to take a hard look at it. And with a vendor ecosystem around Lustre, you can "drop a Lustre file system on a datacenter floor and have it working day one," unlike in years gone by when you needed a team of experts to set up and maintain it.