Advanced Computing in the Age of AI | Thursday, March 28, 2024

EMC DSSD Looks Like A Flash Array, Runs Like Memory 

DSSD, the secretive flash storage startup founded by some prominent ex-Sun Microsystem techies, is letting out some details about its forthcoming products. Or rather, early investor and now parent company EMC is talking a bit about the flash storage to pique the interest of enterprise and possibly supercomputing shops looking for zippy flash storage to boost their workloads.

EMC announced that it was acquiring DSSD for an undisclosed sum back in early May, which was a smart move considering the track record that company chairman Andy Bechtolsheim has with startups.

DSSD was founded by Bill Moore, formerly chief storage engineer at Sun Microsystems, and Jeff Bonwick, another engineer who worked with Moore to create the Zettabyte File System (ZFS) at Sun, which is now controlled by Oracle. Bechtolsheim was one of the original founders of Sun Microsystems back in 1985 and created its first workstation and led its drive into servers. In 1995, he left Sun to found Gigabit Ethernet startup Granite Systems, which he sold to Cisco Systems for $200 million a year later and which was the basis of that company's Catalyst switch line. Five years later, Bechtolsheim got interested in InfiniBand networking and converged systems and founded a company called Kealia that Sun bought in 2004 to become the foundation of its Sun Fire server business (based on Opteron processors) and the "Constellation" clusters aimed at HPC and similar enterprise workloads. In 2008, after a few years back at Sun, Bechtolsheim caught the 10 Gb/sec Ethernet bug and came on as CTO at Arista Networks, where he is today.

Now, with DSSD, Bechtolsheim seems to be interested in flash and PCI-Express links, with the goal of reducing the latency between servers and flash storage while not being inhibited by the confines of the server chassis. DSSD is working on an external flash array that will sit at the top of the rack and act, more or less, like a shared storage area network for multiple nodes, but one that expresses itself as memory, not as spinning rust (either emulated or actual). One twist in the DSSD machine is that like local, internal flash storage, the DSSD arrays will use direct PCI-Express links between the servers and the flash and not depend on slower Fibre Channel, InfiniBand, or Ethernet links. Linking storage directly through PCI-Express rather than over the Ethernet or InfiniBand network stack is nothing new – Fusion-io has been doing it since its founding and others have mimicked this approach. But, as Jeremy Burton, head of product operations at EMC, pointed out in a recent interview at Barron's, no one has done this with an external flash array and certainly not one that will be able to look like main memory or persistent storage, depending on the needs of the application.

"There’s a whole bunch of workloads coming down the line – in-memory, big data, and analytics – and these apps are built differently, they're very compute-intensive and so it's about getting data out of storage and into memory as quickly as possible," Burton explained.

The issue is that you can only get a couple of terabytes of PCI-Express flash memory inside of a single server and you have to manage each device individually and, in the case of Fusion-io, you have to tweak and tune your applications to make use of that flash card. DSSD wants to be able to put hundreds of terabytes of flash into a rack and have it be shared across multiple servers. This is what EMC is calling "ultra hot" data in the storage hierarchy.

emc-dssd-ultra-hot

The cold data is served up by distributed, shared-nothing disk arrays, which have independent controllers, distributed memory, and non-transactional commits; these are aimed at new kinds of workloads, such as Hadoop analytics, NoSQL data stores, and object storage. In the middle where the data is warmer, you have loosely coupled scale-out arrays, which have independent controllers and distributed memory but which have transactional commits, and tightly coupled scale out arrays, which have multiple controllers linked in a grid with shared memory, transactional commits, and consistent and linear performance as you scale out. Then there are PCI-Express flash units and clustered storage, where the memory is clustered across the elements of the array and you have two controllers for redundancy. (In a server PCI-Express flash card, there's generally only one controller.)

The DSSD device is a little more complex than a bunch of flash with a PCI-Express link – perhaps even a low-latency PCI-Express switch – between the servers and the banks of flash. Chad Sakac, senior vice president for global presales technical resources at EMC, has posted a detailed blog about the range of storage architectures available today, and hints at some of the approaches that the DSSD boxes will take. (EMC was one of the investors in DSSD a year ago when it raised its Series A round, as was SAP because of its interest in flash-based memory for its HANA in-memory database and application development environment.)

emc-dssd-block

The trick with DSSD, aside from having an order of magnitude better packaging of flash memory than current devices and having at least an order of magnitude lower latency, as Sakac teased in his blog, is to present the storage not as a block or file device, but rather as the native API that a particular application expects. So, for instance, the DSSD flash is exposed as an HDFS file system or an object store or even a POSIX-compliant file system, it just looks like what the application expects. The array will also be able to house in-memory databases like SAP HANA and will also be able to look like a key/value store like Pivotal Gemfire or Memcached.

Here is how EMC stacks up DSSD against its hybrid flash/disk arrays and its XtremeIO all-flash arrays:

emc-dssd-compare

The idea, goes the speculation on the street, is that DSSD has come up with a means of making its external flash array, linked to multiple servers through PCI-Express, look like main memory as far as the applications are concerned. If it can look like main memory and not hinder the performance of in-memory databases and applications even though it flash is considerably slower than main memory, the economics of in-memory databases will change radically.

This is a play that is something of the inverse of that made by Diablo Technologies and SanDisk are making with their memory channel storage, which puts 200 TB or 400 TB of flash into a DDR3 memory slot and makes it look like an extremely fast local fast drive.

EMC's DSSD appears to take the opposite approach, making external flash look like main memory. This could be done through PCI-Express links for a certain latency level and for even lower latency it is even conceivable that DSSD could do a direct hookup to a QuickPath Interconnect port on a Xeon chip. The trouble there is that Intel generally uses these QPI ports to make the NUMA links between processors so they can share main memory. To get a spare port for a two-socket server, you have to take a more expensive Xeon E5-4600 or Xeon E7-4800 part and use one of its QPI links. (This is what SGI does to link its NUMALink cluster interconnect with Xeon processors.) This approach might fly with very high-end customers, but it would not be a volume product.

The first DSSD products are expected in 2015.

EnterpriseAI