Advanced Computing in the Age of AI | Thursday, April 25, 2024

New Flash from IBM Targets Unstructured Data Analytics 

JBOF (“jay-boff”), or Just A Bunch of Flash, may sound like a pejorative acronym. But when industry analyst Randy Kerns, senior strategist at Evaluator Group, used it in reference to new flash storage technology announced today by IBM, he meant it in a favorable way.

The IBM DeepFlash 150 targets large analytics workloads involving high volumes of unstructured data, and it’s offered at under $1 per gigabyte of data (and less after compression or deduplification). According to Kerns, the combination of lower cost and the ability to address a growing market need make it a potentially important step on the path to broader adoption of flash.

“The fact is that flash is going to be the dominant storage technology going forward,” Kerns told EnterpriseTech, “rapidly replacing spinning disk technology. Up until this point we’ve been mostly looking at, with our customers, putting plans in place to transition all the primary storage requirements to flash. What you have with the DeepFlash 150 is the recognition that there’s value to deploying flash for more than primary storage. Now (organizations) can look at special-use requirements with the value in the reliability, power and the density that you get from flash. So it’s a broadening in the applicability and usage of flash technology.”

The product falls under an emerging industry category called “Big Data Flash,” said IBM’s Bina Hallman, Vice President, Offering Management Executive, Software Defined Storage, in a blog post announcing the product, and it’s aimed at financial services, healthcare, e-commerce, telecom, media, entertainment and cloud services organizations “grappling with constrained IT budgets, massive data sets, and escalating storage performance requirements for their big data workloads.”

DeepFlash 150 delivers storage density of up to 170TB per rack unit, or seven petabytes of all flash in a single industry standard rack, according to IBM, and it runs workloads on X86-based and IBM Power servers and clusters.

IBM's Alex Chen

IBM's Alex Chen

Alex Chen, Director, Storage Systems, Offering Executive, File and Object Storage at IBM, told EnterpriseTech that unstructured data analytics workloads targeted by DeepFlash 150 includes massive volumes of data from social network chatter, telemetry and sensor data from the Internet of Things, streaming television episodes, medical images, video files and online shopping transactions.

Chen cited industry studies finding that unstructured data represents 80 percent of all data and that it’s growing at twice the rate of structured data. Analytics workloads involving this type of data, he said, have unique requirements that call for a new class of flash storage that is lower in cost and provides higher density for storage of multiple petabytes of data in a rack. The DeepFlash 150, he said, contrasts with traditional flash storage products that, because their are designed for for structured data workloads, provide micro-second latency for rapid data updates.

“You don’t update the data very often (in unstructured data analytics), so you write it once and you read it many times,” Chen said. “You do various forms of analytics on it, it has massive bandwidth requirement. But you don’t re-write your video file, you don’t re-write your medical image, you don’t over-write your sensor data. So it’s read-intensive, and at the time that you need it you want it as fast as possible for analytics, you load the maximum amount of data into memory, if you’re doing Hannah or Spark in memory analytics.”

As for cost, he said, “the data sets are much bigger, petabyte-scale, so you can’t spend $4 or $5 bucks (per gigabyte of data), the economics doesn’t make sense.”

The core technology within the new offering was developed by SanDisk, the Milpitas, CA-based flash memory storage devices and software company that announced an OEM relationship with IBM last January.

Hallman said the company recommends DeepFlash 150 with Spectrum Scale, IBM’s software-defined high-performance storage technology built on the company’s General Parallel Filesystem (GPFS), providing the overlying storage services and functionality needed for big data workloads.

“DeepFlash 150 is specifically designed to make an excellent building block for software defined storage (SDS) infrastructures,” Hallman said. “Spectrum Scale…delivers the unified file, object, and analytics support that gives customers the superior resiliency, scalability, and data management.”

Chen said that while several major organizations are currently using DeepFlash 150, Virginia Polytechnic Institute and State University is the only customer that IBM can publicly identify.

Virginia Tech’s Vijay Agarwala, senior director, said, “We currently offer over 4 petabytes of hard disk storage running under IBM's Spectrum Scale filesystem. After more than doubling the number of cores in our compute engines, we needed to add an all-flash tier to our filesystems to provide more IOPS and higher IO density (IOPS/GB of storage). The IBM DeepFlash 150 will meet our ongoing and future needs in a cost-effective way.”

EnterpriseAI