Advanced Computing in the Age of AI | Friday, April 19, 2024

AI Strategies: Mitigating Data Gravity with Hybrid Cloud and Object Storage 

We live in a data-driven world. Successful, leading companies have mastered and operationalized the process of extracting insight and intelligence from all the data collected continuously. The use of data has brought on a sea change in business models, with AI being the primary technique used to distill all this data into actionable insight.

ML/DL depends on training and inference, both of which require fast execution with large data sets flowing smoothly through the pipeline. These algorithms perform better and become more accurate as the training data sets grow.

According to Gartner[1], "The success of ML and AI initiatives relies on orchestrating effective data pipelines that provision the high quality of data in the right formats in a timely manner during the different stages of the AI pipeline." In order to support the data intensive needs of AI, companies need reliable storage solutions optimized across all stages the data pipeline, from ingestion to training and inferencing.

A recent IDC[2] survey identified the key AI deployment challenges as dealing with massive data volumes and associated quality and data management issues. High data quality of distributed data sets has to be maintained to prevent biased and inaccurate model buildout, and this is no trivial task.

With AI implementations churning through increasingly massive data sets, the cheer volume of data creates its own dynamic and challenges. It becomes unpractical and/or prohibitively expensive to shift data workloads back and forth, to and from where the AI pipeline is implemented. Rather, the data stays in a central location and relevant AI pipelines, meaning the application stacks, are brought closer as needed. This is referred to as data gravity.

Hybrid Cloud

Both on-premises infrastructure and public clouds are utilized to support AI initiatives. At one end of the spectrum are cloud-native companies that were created operating in the cloud; at the other are organizations that have invested in on-premises infrastructures and tend to run AI pipeline tasks close to where the data is generated—either in the data center or at edge locations. Data gravity has a significant impact on where AI stages are carried out.

While cloud services providers (CSPs) cater to AI workloads with elastic compute and related services, data gravity is the driving factor for on-premises implementations, making hybrid cloud the best of both worlds. This is backed by findings from IDC that public clouds lead in deployment of AI models and workloads, closely followed by on-prem, private cloud deployments. A hybrid architecture allows the use of public clouds for their AI know-how and elastic capabilities, while enabling local data storage with seamless accessibility across the boundary.

AI and ML/DL train on different data types, which requires varying performance capabilities. As a result, systems must include the right mix of storage technologies. A hybrid architecture meets the simultaneous needs for scale and performance.

Object Storage

Object storage is the technology of choice for AI because of: (a) seamless access between private and public cloud storage with the AWS S3 API, (b) native metadata tagging capability, and (c) limitless scale.

Object storage technology was invented by CSPs out of necessity, with AWS Simple Storage Service (S3) launched as the first object storage implementation back in 2006. The AWS S3 API has since become the de-facto standard. Object storage is therefore inherently compatible with AWS S3 API, which makes it the right springboard to and from public clouds and, hence, the foundation of hybrid AI deployments. Again by definition, metadata tagging is baked into object storage, and that makes it a perfect match for data staging and indexing workflows routinely used in AI. AI’s massive data sets find a natural home in the inherently limitless, cloud-scale capacity object storage is known for.

AI data sets commonly reach multi-petabyte scale, with performance demands that could overwhelm the whole infrastructure. As a result, AI is not suited to run on legacy infrastructures that are challenged to meet the required needs for scale, elasticity, compute power, performance and data management.

When dealing with such large-scale training and test data sets, addressing storage bottlenecks (latency and/or throughput issues) as well as capacity limitations/barriers are key success factors. AI/ML/DL workloads require a storage architecture that can keep data flowing through the pipeline, with both stellar raw I/O performance and capacity scaling prowess.

Such a solution can be implemented using a classic two-tier architecture, with one tier dedicated to high-performance flash while a second tier provides scalable object storage. This is typically implemented as two separate clusters of storage servers to deliver the data to fuel and accelerate an AI rocket.

Greg DiFraia is field CTO of Scality.

[1] Gartner Analyst Report #G00351887, August 2018
[2] IDC Technology Spotlight #US43977818, June 2018

EnterpriseAI