Advanced Computing in the Age of AI | Saturday, September 23, 2023

Pure Storage Hints At Hyperconverged Future 

All-flash storage array maker Pure Storage has been very clear that it is not interested in going backwards and delivering hybrid arrays that mix disk drives and flash memory, as many incumbent disk array makers are doing. But the company's top brass is now hinting that Pure Storage may go in another direction, offering hyperconverged systems that take on incumbent server makers for running application software.

Speaking to EnterpriseTech back at the VMworld virtualization extravanganza at the end of August, Matt Kixmoeller, vice president of products at the company, waved off any suggestion that Pure Storage would add disk drives to its product line. The company's name is Pure, and the pure in that refers to non-volatile memory, with flash being the one of choice for the moment but others on the horizon for when flash runs out of gas years hence.

Database acceleration, virtual desktop infrastructure, and virtualized server clouds are the three key workloads that drove sales at Pure Storage in 2013, Kixmoeller explained, representing roughly a third of revenues each. The database part of the business could grow to represent about half of the business in 2014, Kixmoeller estimated, because companies are looking to accelerate specific databases and are particularly interested in avoiding a switch to alternative data stores. If they can throw flash hardware at the problem to solve a throughput or response time problem, many companies will do that rather than move from a relational database to in-memory or NoSQL data stores. But just because Pure Storage is focusing on relational database acceleration – just like all other providers of PCI Express flash cards and flash arrays (both all-flash and hybrid types) – does not mean Pure Storage is ignoring the other opportunities.

Speaking last week as Pure Storage celebrated its fifth birthday, CEO Scott Dietzen threw down the gauntlet to the band of hyperconverged system players, including VMware/EMC, Nutanix, SimpliVity, Scale Computing, Pivot3, Maxta, and others. Dietzen said that for the past five years, Pure Storage has worked to get all-flash arrays to parity with Tier 1 disk arrays that are used to host the most demanding production data on a cost per gigabyte and feature basis, including in-line compression and deduplication to get that cost parity and with snapshotting, cloning, and other replication features that enterprises have come to depend on in their disk arrays. And then Dietzen added this:

"Looking forward, however, the opportunity for Pure will be more about shaping the next-generation of cloud and web-scale data storage than just replacing legacy disk arrays. The products we are crafting will ultimately compete with commodity server hardware and help change the way systems software like databases and file systems are architected. An opening salvo in the ramping debate between shared and hyper-converged storage can be found here, but we will have much more to say on this over the next 12 months."

That would seem to suggest that Pure Storage is looking to create its own variant on the hyperconverged theme. Dietzen has a pretty broad definition of hyperconvergence, and includes not only virtual SAN implementations like those cooked up by Nutanix, SimpliVity, and now VMware that turn the virtual server cluster into a shared virtual SAN that runs on the same physical servers, but also includes distributed analytics platforms like Hadoop, which puts data and compute on the same nodes. (Or, to be even more precise, it uses local compute where the data is stored on a disk-heavy server to chew on that data locally and then collate the results of calculations across many disk drive/CPU segments into a final result.)

Most all-flash arrays are based on controllers that are, for all intents and purposes, a normal two-socket Xeon E5 server, so adding a virtual SAN layer to a cluster of FlashArray systems and a hypervisor to allow it to run virtual machines is not that much of a stretch. The top-end FlashArray-450 launched in May of this year has two controller nodes, each with two twelve-core "Ivy Bridge" Xeon E5-2600 v2 processors, and Pure Storage says that it purposefully runs its Purity Operating Environment at half speed on the controllers so that in the event of a controller node failure in its two-node system the performance will not degrade. So there is some spare capacity there in the box to play with from the get-go.

It is possible that Pure Storage will create a hyperconverged system out of a heftier server based on Intel's "Haswell" Xeon E5-2600 v3 processors, which were just launched in early September and which have up to 18 cores on a single chip. But such a two-way cluster would not have very much oomph. And while it is possible that Pure Storage could create its own virtual SAN software, it is more likely that it will partner with the existing suppliers such as VMware and SimpliVity and possibly Nutanix to create a flash-only variant of their stacks using FlashArrays as the primary storage in the cluster. At the moment, VMware's VSAN, the Nutanix Distributed File System, and Simplivity's OmniCube are all based on server clusters that have both disk and flash storage to goose the performance of the overall setup. Any one of these, or all of them, could be tweaked to treat Pure Storage arrays as primary storage and offer up much faster I/O operations per second for virtualized server workloads running on clusters of machines. And, importantly, rather than having compression, deduplication, snapshotting, and replication features embedded in the software and chewing up processing resources at the hypervisor layer (where VMware VSAN runs) and above (where Nutanix and SimpliVity run), these array functions would be passed through to the Purity Operating Environment and executed on its controllers.

There are many possibilities. And if Pure Storage becomes a systems vendor in its own right, it may have to rethink its name.