Advanced Computing in the Age of AI | Saturday, July 13, 2024

How NoSQL Maximizes Hardware and Enables Scale Like We’ve Never Seen 

We live in a time when a developer can build an app in six hours, release it to the world, do little to no marketing and still get 2.3 million downloads. We also live in a world where a network-centric app like Slack can go from zero to over 1.1 million daily users and $300 million in revenue in less than three years. Unprecedented scale is the norm thanks to equally unprecedented gains in hardware. And yet we still have much to do to maximize it all.

In September, The Wall Street Journal reported that millions of servers sit idle in data centers around the world. In one particularly troubling instance, consultant Paul Nally visited a facility that had over 1,000 machines powered on yet doing nothing. Hardware is wasted on preparing for worst-case scenarios that never materialize and growth opportunities that may never be.

At the same time, as a global citizenry we are producing more information than ever. According to Cisco's Visual Networking Index (VNI), the number of networked devices will balloon from 14 billion last year to 24 billion by 2019. Internet-based traffic is expected to more than double over that period, from 2 exabytes per day to 5.5 exabytes per day.

Ravi Mayuram of Couchbase

Ravi Mayuram of Couchbase

If data is getting more plentiful and hardware is cheaper and more powerful, why aren't we doing more to put the information we have to better use? The answer is that we've lacked the right tools for processing information. NoSQL databases fill the gap by handling and instantly processing information that's been out of reach for 20th century relational data management systems.

Modern Applications and Serving at Scale

Slack, the increasingly popular cloud-based team collaboration tool, is no ordinary software, but its story is becoming more familiar. Successful apps must scale faster than ever as users rush to adopt the new thing. Every login, every click, every interaction with other users, every new document or message created is data to be stored and processed. The sheer magnitude of it would be mind-boggling if we weren't seeing the same patterns over and over and over again. Scale is the defining and differentiating criteria of modern software serving users.

Volume isn't the only factor that determines scale in the modern era. Velocity is also a factor, especially when you consider the various social networks we frequent. Each minute, Facebook tallies over 4 million "likes", Twitter sends over 347,000 tweets, and users upload over 300 hours of new video to YouTube.

Or consider Netflix, whose global network now serves a bit more than 77,000 hours of video every minute and accounts for over one-third of North America's downstream Internet traffic during peak hours. True scalability means handling high-volume at high speed without compromising the quality of the data transmitted.

These sorts of connected applications share two main characteristics. First, they need predictable performance. Second, they need to be able to scale and change quickly. Think of how many times you have witnessed applications gain traction in days or even hours. These are the apps that deliver so well it’s as if the developers read your thoughts, delivering what you asked for before you asked for it. NoSQL systems make this possible by handling unstructured data as deftly as a relational system handles columns and rows.

What this means is that you can change your schema by simply changing your application. No need to tune your database, adjust for downtime or coordinate between your development and operations teams. Instead, development, testing and production flow as a single, agile process that makes it easier to get to market faster. But again, the key is to do this while preserving scale: the millionth viewer of Netflix’s “Making a Murderer” must have the same experience as the first to log in. Modern architectures cleverly unite lower-cost hardware in elastic pools in order to achieve this, and recent advances in the underlying technology are aiding in the effort.

3 Hardware Trends Taking Over the Data Center

When most of us in the tech industry hear the term "data center," we think of expansive, refrigerated rooms populated by massive machines humming with such force that you'd expect they were powering the very concrete buildings they were housed in. No longer. Today, a data center can be a converted Manhattan apartment that hosts a handful of machines that compress the functionality of a server and storage subsystem into one box while connecting to hyperfast networks. There are three reasons we're seeing more of this:

  1. Cheap memory. A few years ago the most powerful servers on the market had just 8 GB of RAM to spare. Today, a server can scale from 256 GB to over 1 TB for $50,000 or less. This is a dramatic change from when relational systems were introduced. At the time, disk space was more expensive than today’s memory costs. Due to this legacy even modern relational systems are disk-optimized, whereas NoSQL databases are designed from the ground up to process data in-memory over clusters of networked machines. Workloads get completed much faster as a result.
  1. Gigabit networks. Network speed is also improving at a rapid pace with data center backbones now operating at 10 gigabits per second or faster. Many are already moving to 40 Gbps or 56 Gbps with 100 Gbps in sight. At that speed, latency is a non-issue. The network can become connective tissue for clustered machines to handle large-scale processing jobs in a NoSQL database. Relational systems lack this flexibility, having been designed at a time when your only choice for scaling up was with bigger, more expensive machines.
  1. Faster, more cost-effective data storage. While memory is fast-becoming a smart first option for processing large and diverse data sets in NoSQL, persistent storage also matters. Flash is displacing spinning disk in this area primarily because of speed. Storage engines optimized for solid-state drives can perform orders of magnitude faster than their older counterparts and are therefore better equipped to handle complex queries covering differing styles of data.

 A New Style of Database for a New Kind of Hardware

One of my favorite quotes is from the late Jim Gray, a database pioneer who foresaw many of these trends before the rest of us took advantage of them. "Tape is dead, disk is tape, flash is disk, RAM locality of data is king," Gray said in 2006, before NoSQL databases became the must-have technology they are today.

He saw a confluence of forces -- more devices leading to more connections leading to more data to be handled rapidly. Nothing handles data faster than RAM; it's not only on the front lines capturing data as it comes in but also closest to the CPU. Gray rightly concluded that processing in memory would have powerful consequences for those able to pull it off.

Today, that's a growing number. Amazon uses NoSQL to power its web hosting operations. Facebook uses NoSQL to handle hundreds of petabytes of data. And at Couchbase, our NoSQL database serves over 1 billion people around the globe each month, whether it’s helping with a travel reservation, serving an ad or completing an e-commerce transaction.

None of this is data that fits neatly into columns and rows, but it is crucial information churned out by affordable machines with big memory operating on fast networks attached to increasingly flash-driven storage systems.

By themselves, these machines aren't much. Pull them together and add a NoSQL database, and they're the platform that makes it possible to develop and release a hit app in six hours and transform Slack into a $300 million business. Or, in simpler terms: they're the infrastructure that modern software desperately needs.

Ravi Mayuram, Senior Vice President of Products and Engineering at Couchbase, is responsible for the company’s NoSQL offerings.