Advanced Computing in the Age of AI | Monday, June 24, 2024

New Algorithms For The Modern Datacenter 

Quite a few years ago, as a distinguished engineer, I was giving a talk to my network-security peers about algorithms and new trends in technology. I presented a number of slides comparing the current brute force mechanisms to new algorithmic thinking like MapReduce.

It was a tough crowd.

While people showed genuine interest in, and excitement about, the new ideas, they couldn’t see the bridge between what they were doing—and the algorithms they were already using—and how to incorporate these new thoughts into their products and designs. They were stuck thinking about things in the same ways they were already thinking about things.

This bridge is often what drives progress. But in the face of massive sea changes in the industry, incremental innovation can be a trap.

Right now, in the security industry, we’re facing a gap that cannot be traversed through incremental innovation. A step-by-step approach is incapable of addressing the fact that datacenters are growing larger and the workloads inside them are changing faster than ever before. Just as Google revolutionized the parallel processing of large data sets with MapReduce, we in the security space need similar clean-page thinking on systems and algorithms to resolve the widening gap between the existing static security modes and the needs of the dynamic datacenter.

Among the key things that made Google great were revolutionary algorithms. What Google started is still fueling a whole new industry of big data technologies like Hadoop, CouchDB, Redis, and Storm, and growing companies like Hortonworks, Cloudera and MongoDB.

If security is going to keep up with big changes, it needs its own revolution.

Supporting security policy in the dynamic datacenter requires an arsenal of models and algorithms with the following characteristics:

  • Multi-dimensional, not hierarchical: Hierarchical models are a remnant from a time when system limitations required them to enable scaling. This is to say: hierarchical models were necessary when physical constraints needed to be factored in. Remember when Yahoo provided a directory service organized just like the Yellow Pages, which was structured to make it easy for you to find the letter "T," and then the word “taxi”? Google perfected the natural-language search, completely removing the need for the legacy directory-based approach. In the same way, security policy specified in the language of the network should be swept away and replaced by algorithms based on logical sets of workloads that can be specified as coordinates in a multi-dimensional space.


  • Set theory, not table scans: Many of today’s algorithms rely on comparing values across multiple columns to find a match. The algorithms can be optimized with binary searches, TCAMs, hash tables,and caches, but the fundamental remains: You need to look up something in a table from start to finish to find the best match. The ability to define entities with different logical attributes in a multi-dimensional space, as well as group them into logical sets (projections into lower-dimensional spaces), creates a solid foundation. Then, expressing relationships between these objects and sets allows you to compute policy in a completely different way, using set theory (intersection, unions, and subsets) and some graph primitives. The result is a more expressive and powerful policy model, where rules can often be described in orders of magnitude with few expressions. This makes it easier to verify the rules to reflect the correct policy intent.


  • Scale horizontally, not with brute force: Security systems often rely on decisions being made in a central place. There is frequently a lot of coupling between these decisions and it is assumed they are made with a global view of the data. Let’s go back to the MapReduce example I started with: To successfully scale extremely large datacenter deployments, compute elements work individually to calculate and assemble policy perspectives—just like the map—and reduce steps. This allows for scale out of the security policy calculations. In addition, the focus on both calculating and distributing incremental changes, rather than approaching with brute force, allows large sets to be distributed. A much smaller, and therefore more efficient, change set can then be applied on top.


  • Declarative, not imperative—Developer Philip Roberts defines the two primary types of programming as:


  1. Imperative: You tell the system how to do something and, as a result, what you want to happen will happen.
  2. Declarative: You tell the system what you would like to happen, or what the ideal state is, and the computer figures out how to do it.

Implementing a system that takes a declarative policy description requires that you are able to collect the current state, determine the ideal state, compute and distribute the instructions that are necessary to arrive at the ideal state, then lather, rinse and repeat.

Revolutionary algorithms transformed search and big data. They appeared because there was need, a problem none of the existing tools in the toolbox could solve. As we bring security forward to meet new and evolving demands, we need innovative algorithms to modernize with a much more dynamic and distributed datacenter.

-- PJ Kirner is chief technology officer and founder of Illumio, a datacenter security vendor based in Sunnyvale, Calif.