Advanced Computing in the Age of AI | Saturday, September 23, 2023

eBay’s Liquid Cooled Billion-transactions-a-day Data Center 

Water transports heat 25 times more efficiently than air, but water and electronics don’t mix – hence the prevalence of air cooling in the data center. It’s controlling water’s cooling capability without destroying servers and shutting down systems that’s bedeviled data center services providers, IT managers and technology vendors for decades.

Now, in a proof-of-concept project six years in the making, eBay, Dell and Intel, say they’ve made major strides in channeling the potential of liquid cooling – enabling greater processing power at a fraction of normal power consumption and within smaller a smaller footprint – that could have implications for the hyperscale and web services market.

eBay, of course, defines “at scale” in ecommerce. The world’s largest online marketplace handles more than 1 billion transactions per day and has nearly 95 million active users globally.

Key to the project is the anti-leakage provisions engineered into the liquid cooling capabilities of Triton (the messenger of the sea in Greek mythology), Dell’s rack-scale infrastructure for hyperscale implementations, combined with a customized 200W Intel Xeon processor E5 v4, which provides significant performance increases over the highest performing Intel Xeon processor on the market today – and generates a lot of heat. The result: according to Dell, Triton’s ability to sub-cool the processor and operate at higher frequencies means it can deliver for similar costs nearly 60 percent greater performance than Intel’s Xeon E5-2680 v4. Compared with average air-cooled data centers, Triton uses 97 percent less cooling power and has a power usage effectiveness (PUE) of 1.02 to 1.03.

Jyeh Gan, director, Datacenter Scalable Solutions at Dell, told EnterpriseTech a shift is happening in the server market, with increased demand for optimized at-scale data center strategies: “A lot of hyperscale practices are starting to shift toward more enterprise-type customers, they want to move to web-scale type environments.”

This part of the server market is growing three times faster than the traditional enterprise x86 market, he said, and helps explain Dell’s announcement last December of an entity dedicated to the hyperscale and sub-hyperscale market: the Extreme Scalable Infrastructure (ESI) organization, successor to the Data Center Solutions group formed eight years ago. “We felt we needed on organization really focused on it,” Gan said.

He noted that while Dell has built more than 250 MDCs (modular data centers) since 2006, “in the last year we’ve engaged with more than 300 customers.”

For eBay, Dell has built a new datacenter cooling solution to improve performance during peak times while reducing cooling and power costs, heavy contributors to TCO.

“We worked closely with Dell to develop a customized server solution,” said eBay’s Nick Whyte, vice president, Fellow Search Technology, “which utilizes an innovative approach of liquid cooling 200W CPUs to deliver large performance and efficiency gains. By collaborating with Dell and Intel our search servers achieved an increase of 70 percent in throughput (QPS – queries per second) with the new Intel Xeon processor E5-2679 v4 versus the previous generation Intel Xeon processor E5-2680 v3 in the ‘Triton’ proof of concept.”

With Triton, Dells claims to be the first major vendor to safely bring facility water directly in each server sled to cool the CPU, which Gan said delivers enhanced cooling efficiencies along with the lowest water consumption of any liquid cooled solution.



Austin Shelnutt, principal thermal engineer at Dell, told EnterpriseTech that bringing water into each server is referred to as a “direct contact model,” where the water is in as close proximity as possible without actually being immersed. “The same water that pushes from the facility into the servers also passes through liquid air heating exchanger in the back of each chassis, allowing all the airborne heat in the server to be dissipated back into the liquid loop.

“This is a big advantage because we don’t have to have any secondaryTritonrear door heat exchanger on the rack to absorb that heat,” Shelnutt said. “We also don’t have to apply what can be expensive cold plates to each of the individual components in the server. Instead we can keep our cold plates running in just what’s called ‘the highest offenders,’ in terms of heat production: the CPUs and the voltage regulators.”

That warm water is then pushed out of the facility where it can either be dissipated by a facility cooling tower or repurposed to heat buildings or melt snow in parking areas.

He said Triton has been designed to alleviate the biggest concern IT managers have regarding liquid cooled servers: leaks.

“We have a very elaborate leak mitigation system within the rack,” he said, “that starts with every blade or server. We have leak detection and leak containment, and the ability to turn off water within the individual blades within the chassis itself, and the rack itself, depending on where a leak detection occurs, to isolate the splash zone.”

Gan said Triton is “rack scalable,” meaning the liquid cooling system is not hard fixtured into the rack, it scales with the individual chassis “so our deployment flexibility is extremely high. We can heterogeneously mix liquid and air cooled servers within the same rack, and in fact within the same chassis, without having to go to great lengths to separate those devices or make very standard, rigid configurations.”

While air-cooled systems will remain predominant across the broad market, Gan said, the hyperscale and sub-hyperscale sector is increasingly embracing water.

“We’re not saying air cooling is going away by any means,” he said, “we’re just saying there’s going to be a definite need for things beyond air-cooled.

“There is still a hesitation to adopt liquid within data centers, to bring water in such close proximity with servers,” he said, “but the benefits are very well known, from a cooling capacity and cooling efficiency perspective. What we’re starting to see is that the performance benefits now are quite tangible to customers. It’s far easier to have the conversation about mitigating the risk and accepting the risk when you’re able to quantify what the benefits are.”