Advanced Computing in the Age of AI | Saturday, April 20, 2024

3M and Allied Control Cool Clusters with Novec Bubble Bath 

One of the more interesting demonstrations at the SC13 supercomputing conference last week in Denver was a Xeon E5 server dunked in a tank of Novec fire suppressant fluid. The server was not on fire, but the Novec fluid, which does not react with electronics, was used to passively remove heat from the server.

3M, which makes several flavors of Novec fluid, partnered with Allied Control, a consultancy based in Hong Kong, to do the demo. But cooling datacenter gear with Novec is not a science project. The two companies have put Novec-based passive immersive cooling system into production. Allied has, in fact, deployed two dense-pack clusters that are cooled by Novec baths in two datacenters in Hong Kong.

As you might expect, both real estate and electricity are very costly in Hong Kong. The shift from air cooling to passive immersive cooling based on Novec fluid allows the operators of these datacenters to radically increase the efficiency of the cooling of their machines and that lets them shrink their rack footprints as well. In theory, thanks to the efficiency of the immersive cooling, customers can also think about overclocking components to get more work out of their systems or they can put a lot more gear in the same physical footprint, also allowing them to do more work in a given space.

"It costs less to build this Novec-cooled system than it would take for a larger conventional datacenter to house the same equipment," Kar-Wing Lau, vice president of operations at Allied Control, which is based in Hong Kong, explained to EnterpriseTech at SC13 last week. It is not that the Novec passive immersive cooling racks do not cost more than standard air-cooled equipment racks. They do cost more and it takes some engineering to tweak the systems to go into the modified racks. But the real estate, power, and cooling costs are so high in a city like Hong Kong that it is a wash.

What applies to Hong Kong applies to any datacenter in any major city anywhere in the world where power and real estate are expensive and reducing latency – the distance between systems and users – is paramount.

An Engineered Fluid

Novec is family of dielectric fluids created by 3M that have a number of different uses. The fluid is a hydrofluorocarbon, or HFC, that is used to clean electronic equipment after it is manufactured. It is also commonly used as a fire suppressant in datacenters and as a heat transfer mechanism for electronics in military applications and now datacenter equipment.

3m-allied-passive-novec-cooling

The important thing here is that Novec does not damage electric equipment, so you can dunk processors, motherboards, wires, and other components of a system right into the Novec. Kim Griger, marketing manager of datacenter products at 3M, told EnterpriseTech that unlike chlorofluorocarbons (CFCs), perfluorocarbon (PFCs) or perfluoropolyether (PFPEs), Novec has an atmospheric life of only five days once it evaporates, unlike these other chemicals, which can remain active for years to decades. Novec is not poisonous. You can drink it, but it won't do nice things to your digestive tract. If you dip your hands in it (as we did at SC13) it evaporates very quickly, like alcohol. Novec has a density that varies between 1.5 to 1.6 times that of water, so it is not a light fluid, but Lau said that it was not so heavy that it would require modification of the floor in a datacenter to be used.

The version of Novec that is used to cool datacenter gear is called Novec 649 and it boils at 49 degrees Celsius (120 degrees Fahrenheit). The Novec 7000 series, which is used in military and aviation equipment for cooling, has boiling points that range between 34 and 128 degrees Celsius, and these can also be used for immersive cooling in the datacenter depending on your needs.

The two-phase cooling process with Novec is very simple. You dunk the electronics into a Novec bath, and the heat from the processors, main memory, chipsets, and peripheral cards causes the Novec to boil. The vaporized Novec rises out of the bath to the top of the sealed container where a liquid condenser coil chills it. The Novec transfers its heat to this condenser, turns into rain, and falls back into the tank. There is no need for circulator pumps to keep the Novec moving; natural convection currents between the hotter and colder portions of the Novec fluid set up the flow to move colder liquid toward hotter components. Here is a four-socket Xeon server bubbling away in the Novec bath in the demo at SC13:

DSC01126

The first immersive cooled cluster that Allied Control built in Hong Kong was for a financial services company that was using field programmable gate arrays (FPGAs) to perform SHA-256 hashing algorithms as part of its digital signatures for financial transactions. This financial services company (which does not want to be named) originally looked at using air cooling for its 6,048 Spartan-6 LX150 FPGAs from Xilinx, which included big wonking fans on each system board and would take up to ten 42U racks to house. Here is the difference in packaging that the immersion cooling allowed for this cluster, which is called Immersion-1:

allied-immersion-1-cluster

Because Novec is 4,000 times better at heat transfer than forced air, you can get rid of the heat sinks, fans, cold plates, and other components to cool the system. (Another side benefit if the Novec dunking approach is that you don't have to redesign your cooling system to accommodate new system boards.) The total heat dissipation from the Immersion-1 FPGA cluster was 70 kilowatts, and the racks had a power usage effectiveness of 1.02, which was a lot better than the 1.95 PUE from the air-cooled setup shown above.

By the way, Allied Control estimates that the FPGAs had the equivalent performance on the SHA-256 hashing algorithm of 8,500 two-socket Xeon servers, which would have consumed 200 racks of space and burned something on the order of 6.4 megawatts. This would not be possible in something the size of a shipping container in an office tower in downtown Hong Kong, which was possible with the FPGAs. So now you know why this financial services company went with FPGAs.

The second system that Allied Control has worked with 3M to build in Hong Kong is used for the same purpose – SHA-256 hashing – but in this case, the financial services customer has a custom ASIC to run the hashing algorithm. These ASICs have a thermal design point of 80 watts, about the low end of a standard Xeon part. The drawers in the racks can hold 92 nodes of these ASICs and you can get three drawers in a rack for a total of 276 nodes. Each drawer can handle up to 25 kilowatts of power draw and cooling and up to 225 kilowatts per rack. (Those figures are not typos.) The Immersion-2 system 20 racks, for a total of 5,520 ASICs and 500 kilowatts of power draw – all of which fit into a single containerized datacenter.

Just for fun at the SC13 event, Allied Control cooked up a conceptual design that would put eight interleaved Xeon E5 system boards with InfiniBand links into one of its drawers. This drawer would be 3U wide and a third of a rack tall, and would include a power distribution backplane and a PCI-Express backplane. Each Xeon E5 node would have eight Xeon Phi coprocessor cards, for a total of 64 cards, cramming 64 teraflops in the draw with something on the order of 25 kilowatts of power draw. You could get somewhere between four and six of these in a rack, depending on the size of the rack (42U to 47U). This is pretty impressive compute density.

EnterpriseAI