Advanced Computing in the Age of AI | Thursday, March 28, 2024

Make Way for Gen S: Generation Scale-Out 

Gen X, Y, and Z get all the attention but what we need to talk about in IT is Gen S: Generation Scale-Out.

Times have changed. The enterprise technology stack of 10 years ago, when VMs and block-based Storage Area Networks (SANs) ran SQL databases and bolted-on data services like backups, is hardly recognizable today. Nothing was abstracted from its hardware. A petabyte was a massive installation.

As the industry embraces AI/ML, sharded NoSQL databases – where the data is partitioned and distributed to individual servers, transactional applications and other high-performance computing needs – infrastructures must be able to support the workloads at any scale. Those infrastructures must then be able to support any storage or server resource, any platform and any hardware or container. That will allow a Gen S developer, site reliability engineer, or data scientist to be able to spin up a thousand servers today and a fraction of that number tomorrow. It will enable them to burst an application to the public cloud for processing and then quickly bring the data back down to earth.

In this example, Gen S is not an age group that one belongs to, like Generations X, Y and Z. Instead, it is a state of mind. I am no spring chicken myself, but I fully embrace the scale-out mentality.

Google led the way in building the scale-out approach to software. In the early 2000s, it published a series of articles that inspired radically new technologies, a long list of them, including MapReduce, the distributed lock system Chubby, and of course Google File System itself.

Gen S applications are scale-out: sharded NoSQL databases or, key value stores; Hadoop (a direct descendant of MapReduce) and Spark for analytics; and TensorFlow and PyTorch for machine learning. Gen S IT teams – developers, site reliability engineers, data scientists – expect the infrastructure to match. Capacity and performance need to scale elastically without disruption, without impact on users or production.

One challenge is that Gen S apps need to be deployable in any environment, whether that’s on-premise, on bare metal, or on the public clouds. This “compute anywhere” paradigm is integral to scale-out because it’s part of what enables quick and automated provisioning of new resources.

Scale-out also requires time scaling: resources need to be available on demand, 24/7, without downtime or maintenance windows. And scale-out requires simplicity so applications can be deployed quickly on any platform without specialist knowledge. This is particularly important when application owners want to run their own systems – it shouldn’t be any more difficult than running a database on Linux.

Traditional storage architectures aren’t capable of supporting Gen S goals. Traditional IT infrastructures – VMs, block storage platforms, and dated protocols such as NFS/SMB lack the flexibility, performance, scalability and availability needed today and are far too expensive, especially at large scale. All that investment in GPUs is wasted if the storage can’t keep up with the compute-intensive stages of the workload.

Storage technologies designed decades ago for single storage servers are a major bottleneck in enterprise storage today. They have a wealth of other issues not related to scale-out, such as an absence of checksums, no end-to-end encryption, and other serious concerns.

To fill the needs of these Gen S compute-intensive applications, Gen S storage must also be present to meet user expectations. Gen S storage scales linearly to ensure that the storage can grow with user demand from a small number to thousands of nodes. Gen S storage doesn't suffer from the limitations of dated protocols and platforms. It can run on-premises, on bare metal, and on public clouds, via software that works on almost any x86 server and commodity hardware, including any combination of NVMes, SSDs, HDDs, and clouds as needed. That means being able to run scratch, home, archive, and all workloads from the same storage system with NVMe and HDD.

This Generation Scale-Out scenario has the performance and capabilities that modern applications need: high throughput and low latency for AI/ML, large-block sequential and small block random for mixed general workloads. There are no bottlenecks or single points of failure – which is essential to the scale-out approach.

Gen S storage typically offers a single namespace for universal access to data regardless of protocols and clients, and supports S3, Linux, Hadoop, OpenStack, Windows and NFS (yes, even NFS if needed). This particularly supports the ingest and preprocessing needs of AI/ML, modeling, and analytics that depend on a large data set, as do life sciences applications where massive data loads from imaging or DNA sequencing are typical. Even environments like financial services or media/entertainment are using Gen S applications and storage to support more efficient workflows.

It's time for enterprises to embrace Generation Scale-Out in infrastructure, applications, and especially in storage.

About the Author

Björn Kolbeck is co-founder and CEO of Quobyte, a developer of data center storage system software. At Google he worked as tech lead for the hotel finder project (2011–2013) and was the lead developer for the open-source file system XtreemFS (2006–2011). Björn’s PhD thesis dealt with fault-tolerant replication.

 

EnterpriseAI