Advanced Computing in the Age of AI | Friday, March 29, 2024

Liquid Cooling for High Utilization Data Centers 
sponsored content by Asetek

Asetek’s RackDCU Extension

Asetek’s RackDCU Extension

As the lines blur between traditional HPC and Extreme Scale Commercial Computing, proven solutions for HPC can provide competitive advantage in the commercial segment.

Done correctly, liquid cooling results in CapEx avoidance by mitigating both data center physical expansion through increased rack density and in reducing infrastructure investments such as the need for chiller, HVAC and cooling plant build-outs.

With cooling costs often using one third of data center energy, the OpEx benefits can be compelling in reducing both overall data center cooling and server power consumption, and even in enabling energy recovery.

Different Approaches to Liquid Cooling

Most data centers today bring liquid into the data center with Computer Room Air Handlers (CRAH) or Computer Room Air Conditioning (CRAC) units to cool the data center air. CRAH units bring chilled water in to cool the air and refrigerant comes into CRAC units as a liquid.

For the CRAC and CRAH units to produce cold enough air to cool servers, the facilities liquid coming into the data center must be refrigerated to a temperature below ambient (outdoor) air. That is expensive. Additionally, the CRAC and CRAH units produce cold air that must be moved to the racks consuming more energy and this air must be further moved through the servers via server fans.

The use of rear door, in-row and over-row liquid coolers as a liquid cooling solution focuses on reducing the cost of moving air by placing the air-cooling unit closer to the servers. Rear door coolers replace rear doors on server racks with a liquid cooled heat exchanger that transfers server heat into liquid when hot air exits the servers. Servers are still air-cooled and facilities liquid must be brought in at the same temperatures needed for CRAH units (<65 degrees F and that liquid exits at <80 degrees F). Expensive chillers are still required and server fans still consume the same amount of energy.

Immersion cooling solutions on the other hand remove server heat by placing servers in tanks of dielectric fluid or filling custom servers with dielectric fluid. Challenges with this approach include the maintenance of servers, modification of servers with non-standard parts, large quantities of oil-based coolant in the data center and poor space utilization due to the server “racks” lying horizontally.

The “Direct Touch” approach to liquid cooling replaces air heat sinks in the servers with ‘Heat Risers’ that transfer heat to skin of server chassis where cold plates between servers transfer heat to refrigerant so the heat can be removed from the building. While this eliminates fans in the server and the need to move air around the data center for server cooling, cooling infrastructure is still needed to cool the refrigerant to 61°F and cold plates between the servers reduce the capacity of a typical 42U rack to around 35 RUs.

Asetek’s RackCDU™ Direct-to-Chip™ (D2C) hot water cooling is an approach that brings cooling liquid directly to the high heat flux components within servers such as CPUs, GPUs and memory. CPUs run quite hot (153°F to 185°F) and hotter still for memory and GPUs. The cooling efficiency of water (4000x air) allows D2C to cool with hot water and allows the use of dry coolers rather than expensive chillers to cool the water returning from the servers. This also reduces the power required for server fans.

RackCDU Extension Installed on 96 node rack and server node with CPU coolers

RackCDU Extension Installed on 96 node rack and server node with CPU coolers

The RackCDU D2C solution is an extension to a standard rack (RackCDU) combined with direct-to-chip server coolers (D2C) in each server. Because RackCDU has quick connects for each server, facilities teams can remove/replace servers as they do today.

Pumps replace fan energy in the data center and server, and hot water eliminates the need for chilling the coolant. D2C liquid cooling dramatically reduces chiller use, CRAH fan energy and server fan energy. It also delivers IT equipment cooling energy savings greater than 60% and rack density increases of 2.5x-5x times versus air-cooled data centers.

RackCDU D2C uses a distributed pumping model. The cooling plate/pump replaces the air heat sink on the CPUs or GPUs in the server. Each pump/cold plate has sufficient pumping power to cool the whole server, providing redundancy.  Unlike centralized pumping systems requiring high pressures, the pressure needed to operate the system is very low.

In addition, RackCDU includes a software suite providing monitoring and alerts for temperatures, flow, pressures and leak detection that can report into data center software suites.

Direct to Chip Liquid Cooling Momentum

Direct to Chip hot water liquid cooling is showing significant momentum in usage models important to both HPC and Commercial data centers:

  • Mississippi State University (MSU) installed a Cray 300LC supercomputing cluster that incorporates Asetek’s D2C. Key in the purchase decision was that MSU was able to increase computing capacity without buying new chillers and related equipment, benefiting CapEx.
  • Lawrence Berkeley National Laboratory (LBNL) has found that Asetek’s direct cooling technology not only showed cooling energy savings of over 50%, but also savings of 21% of total data center energy, benefiting OpEx.
  • Cray’s 300LC Liquid Cooled Cluster Supercomputer

    Cray’s 300LC Liquid Cooled Cluster Supercomputer

    Highly virtualized applications are being implemented with Asetek’s D2C at the U.S. Army’s Speakman Center Data Center. The goals of this installation include 60% cooling energy savings and 2.5x consolidation within existing infrastructure, and 40% waste-heat recovery.

  • Asetek’s RackCDU D2C is being installed in the financial industry for quantitative investing and trading where the benefits of D2C enable cooling at regular 100% cluster utilization with high rack densities.
  • At the University of Tromso (UIT) in Norway, the Stallo HPC cluster is targeting 70% IT energy re-use and district heating

 

Proven Technology Supporting Business Needs

For Commercial Computing, leveraging the lessons of liquid cooling in HPC must be done in a manner that comprehends the factors that are vital to the commercial data center with its uptime demands.   Serviceability, monitoring and redundancy are as important as energy savings or improved OpEx/CapEx.

In both HPC and commercial high utilization environments, liquid cooling done correctly is based on addressing both the business and operational requirements of the data center.

 

For more information on Asetek and RackCDU D2C:  http://asetek.com/data-center/data-center-coolers.aspx

EnterpriseAI