Covering Scientific & Technical AI | Wednesday, October 9, 2024

Pure Storage Makes Splash With ‘Big Data Flash’ 

(Joe Techapanupreeda/Shutterstock)

As data intensive storage workloads proliferate across a range of industries, storage vendors are attempting to upgrade and scale their platforms to speed the analysis of a torrent of unstructured data.

The latest attempt comes from all-flash storage array specialist Pure Storage (NYSE: PSTG), which last week released an updated version of its FlashBlade solid-state array. The company said an 8.8 Tb and 52 Tb blade capacities are generally available along with accompanying software.

The upgrade comes 10 months after the Pure Storage, Mountain View, Calif., unveiled its FlashBlade platform. Since its official release last July, the storage leader claims it has made significant inroads in areas such as real-time and big data analytics, financial analysis and energy exploration. All of those segments are looking for new approaches for handling the proliferation of unstructured data while attempting to connect the dots to make sense of it all.

Those use cases are driving a new set of storage and other workloads requirements that industry watchers assert have made traditional storage architectures obsolete. The result has been the emergence of "big data flash" platforms that began generating revenues for the first time in 2016.

"Big data flash platforms are optimized to handle very large unstructured data sets with high degrees of concurrency while delivering flash performance and reliability," said Eric Burgener, IDC's research director for storage. The market analyst estimates the emerging all-flash storage market will reach more than $1 billion in revenues by 2020.

Pure Storage, which launched an initial stock offering in October 2015, has since been positioning itself to leverage the transition from analyzing historical data in batch mode to real-time analytics driven by emerging tools such as Apache Spark.

Along with all-flash arrays, the real-time approach requires scalable file and object storage. Hence, Pure Storage CEO Scott Dietzen stressed in a statement that the company has expanded the FlashBlade platform to handle the "rapidly expanding world of unstructured data."

The company also cited a batch of emerging use cases for its all-flash storage architecture, including a big data genomics project at the University of California at Berkeley along with banking applications based on cloud and software-as-a-service approaches.

The UC-Berkeley project includes complex analysis and running data-intensive visualizations in three dimensions, the company noted, asserting the Spark queries that previously took 12 hours have been reduce to about 30 minutes.

Those use cases are based on early deployments of the FlashBlade platform designed to "harden" the platform across varied workloads, the company noted. Pure Storage also said it gained experience in running large-scale Apache Spark clusters for tasks such as machine learning and SQL query processing.

The storage vendor also said it has identified similarities among different workloads. "We’ve observed that analyzing genomes for clinical diagnostics is, from a workload perspective, very similar to the way geophysicists use clusters of computers to perform geophysical mapping in oil and gas," the company noted in a blog post.

"Simultaneously, we realized how similar this flow is to the way a data scientist uses Apache Spark for business analytics or to create scenarios for machine learning."

About the author: George Leopold

George Leopold has written about science and technology for more than 30 years, focusing on electronics and aerospace technology. He previously served as executive editor of Electronic Engineering Times. Leopold is the author of "Calculated Risk: The Supersonic Life and Times of Gus Grissom" (Purdue University Press, 2016).

AIwire