Advanced Computing in the Age of AI | Saturday, December 9, 2023

SGI Snaps Up FileTek For Active Archiving Smarts 

SGI has a long history in dealing with large data sets, and it is one of the major players in active archiving with its venerable Data Migration Facility (DMF) software. And now it has acquired the assets and key personnel at FileTek, another key player in active archiving, to create a more complete tool for managing data. The deal will also help SGI expand further into the telecommunications, banking and financial services, and media and entertainment sectors where FileTek had many of its customers.

Bruce Elder, general manager of storage at SGI, tells EnterpriseTech that about 25 people will be coming on board SGI from FileTek, and that they are the key people who developed and supported the company's TrustedEdge and StorHouse data management products. The financial terms of the acquisition were not disclosed.

SGI and FileTek both have long histories in data management, and Elder says that in recent years they sometimes ended up in the same accounts, pitching their respective tools as complementary to each other. So the acquisition made technical sense.

SGI's DMF was originally created for supercomputing environments more than two decades ago and has expanded out to commercial use over the years. As the name suggests, DMF is used to migrate data through multiple tiers of storage so that frequently accessed data is stored on high-performance disk arrays while warm data is stored on slower speed arrays and colder data is pushed out to tape or cloud storage. The beauty of DMF is not just that it does hierarchical storage management, but that all of this information looks like it is online in a single, integrated storage pool and is accessible to users and applications alike – albeit with radically different access times for files.  Some of SGI's largest customers have close to 2 billion files and over 100 PB of capacity online that is managed by DMF's policy engine.

"If you try to keep this exploding volume of Web transactions, video, or whatever, and you buy more and more expensive primary spinning disks to hold it, you are going to bankrupt yourself," says Elder with a laugh. "You don't need to do that, because you don't necessarily need all of that data now."

In fact, SGI estimates that anywhere from 70 to 90 percent of the data that companies keep online is touched very often, and sometimes never again once it is created.

DMF was aimed at unstructured data, at first data sets of scientific applications and then later video and audio files, Web data, and other such information. But FileTek's StorHouse was created to do active archiving, as this policy-controlled movement of data across storage devices is called, on structured data such as relational database tables. It could also handle unstructured data and was therefore a competitor to SGI's DMF.

Whether the data is unstructured or structured, the trick is figuring out what is hot and what is not, and scheduling for it to be moved to the appropriate tier in the storage hierarchy. That's where FileTek's TrustedEdge comes in, says Elder. This is a client-based tool that runs on a Windows or Linux workstation that can reach out into storage devices – NAS and SAN arrays, storage clusters, tape drives and libraries, anything with a file system – and does analytics on the files stored on these devices to show what data is where and how frequently it is used. Before using TrustedEdge, identifying the hot and not data in a network was a manual process with DMF, and obviously across increasingly large data sets and diverse storage systems, this can be a pain in the neck.

SGI says that it will continue to sell and support TrustedEdge and StorHouse separately and will also integrate the technologies with each other and with its InfiniteStorage Gateway appliance. This appliance has 276 TB of its own capacity, and is used as a placeholder as warm and cold data is moved off primary arrays and staged to be pushed out to slower and cheaper storage. This includes tape, low-powered MAID arrays that have disk drives idled when they are not in use (massive array of idle drives is what the acronym stands for), remote storage over the wide area network, or cloud-based storage. The InfiniteStorage Gateway appliance is based on a two-socket Xeon E5 server running Red Hat Enterprise Linux, DMF, and the XFS file system. It will very likely also be running StorHouse at some point in the not-too-distant future.