Advanced Computing in the Age of AI | Wednesday, April 24, 2024

Liqid’s Off-the-Shelf AI Supercomputer Takes on DGX-2 

Liqid LQD8360

This article has been updated, with information added on April 24 to the original version.

Composable infrastructure specialist Liqid has taken on Nvidia’s DGX-2, the 2 petaFLOPS superstar of AI computing, and the upstart says it has built a similar GPU-based supercomputer using off-the-shelf technology that costs half as much as DGX-2 and delivers more than 20 percent higher performance on the ResNet-50 image recognition benchmark.

Liqid’s new system, the LQD8360, utilizes the company’s PCIe composable fabric, Dell Technologies PowerEdge R640 servers and up to 20 Nvidia Quadro RTS 8000 GPUs in an extension chassis that houses the GPUs in a separate physical enclosure, or JBOG (Just a Bunch of GPUs). Liqid told us its Command Center software paired with an intelligent, low latency PCIe-based fabric enables GPUs to be dynamically configured with the Dell Technologies R640 nodes at the bare-metal level.

The result: on a TensorFlow ResNet-50 benchmark the LQS8360 system achieved image training throughput of more than 15,000 images per second compared with DGX-2’s 12,000/second, according to Liqid.

“It is one of the world's fastest single computers out there,” Liqid CEO and Co-founder Sumit Puri told us. “And we didn't do this by building a bunch of exotic hardware. We did this by taking standard, off-the-shelf hardware and composing the configuration that would yield the world's maximum performance. And now … we are going to SKU this up at Dell, and customers will be able to purchase this directly from Dell.”

Declining to give specifics dollar figures, Puri said the LQD8360 will be priced at roughly half that of the DGX-2, which has a list price of $399,000.

To be sure, Puri does not claim for his system the title of "DGX-2 Killer." In fact, he said Nvidia is a technology partner of Liqid’s and that Nvidia helped tune LQS8360 performance. Instead, he emphasized that the LQS8360 is specifically suited for “visually intensive” workloads (such as real-time analysis of surveillance video, facial recognition, license plate identification, smart city traffic monitoring), thus the use of the ResNet benchmark, which measures images recognized per second for both machine learning training operations and inference.

A key difference between the two systems is their respective GPUs: LQS8360 utilized the less expensive and older Nvidia Quadro RTX 8000 while the DGX-2 features Nvidia’s newer, higher priced V100 Tensor Core.

"The DGX-2 is based around the V100 platform, the V100 is a specific type of GPU that is pushing into certain data center-centric workloads," Puri said. "Nvidia will most likely never build the DGX-2 based upon the RT X 8000 GPU because it's not their highest end flagship product. What we're finding is by taking that RTX 8000 and deploying it onto our fabric and configuring it the appropriate way, we're able to get to levels of performance where it may be, in some cases, competing against the DGX-2 solution. In other cases, (the LQS8360) is actually more for rendering jobs where DGX-2 might be more for machine learning…   There are certain workloads where a customer may want to use a V100 because for their given workload it will outperform. We're not saying we outperform (DGX-2) in every situation. But under certain workloads, we do pretty darn good.”

Sumit Puri of Liqid

Karl Freund, senior analyst, HPC and machine learning, of industry watcher Moor Insights & Strategy, said Liqid may have more success selling the LQD8360 for rendering workloads rather than AI. “The Quadro RTX is a) connected over slower (2X) PCIe vs NVLink, b) does not have HBM Memory, and c) does not have tensor cores,” he told us in an email. “But they were able to pack 20 of these into a server and that is impressive.”

He also noted that that while ResNet-50 is for small images, “a lot of AI is moving to solve much larger problems, for which NVLink will provide much better scalability.” Regarding the price-performance difference between the two systems, Freund said this “is more important, imho, for rendering, which also does not need to scale out of the box in a latency-sensitive manner.”

In building the LQD8360, Liqid partnered with telecom provider Orange Silicon Valley, a subsidiary of multinational telecommunications operator Orange S.A. (formerly France Télécom), and Dell Technologies.

With bare-metal composability and an optimized Dell BIOS, the LQD8360 permits up to 20 RTX 8000 GPUs to be assigned to PowerEdge R640 nodes on the fabric without requiring physical chassis redesign, making it the industry’s highest-capacity expansion chassis (JBOG), according to Liqid. When configured with 20 GPUs (with 48GB of memory capacity each), the system delivers 960GB of VRAM and enables Nvidia GPUDirect peer-to-peer, which allows high-speed direct memory access transfers between the memory regions of each GPU on the fabric, storing and loading data between the memories of two GPUs. In addition, Liqid Command Center is designed minimize idle compute resources by reallocating GPUs to various nodes as workloads are completed.

“Liqid’s composable solutions reduce the cost of deployment by optimizing the ratio of GPUs to CPUs and dynamically changing such ratios as needed, significantly improving the total cost of ownership for high-density computing environments,” the company said. “The composable model enables GPUs to be incorporated into compute nodes on the fly to take maximum advantage of these powerful compute accelerators through software-defined technology.”

The system came out of work Orange Silicon Valley did in collaboration with Liqid.

Source: Liqid

“They brought in some of our equipment and started testing it for specific use cases that they're looking at for their end customers … around AI and GPUs…, doing things like putting intelligent GPU out in the cloud and at the edge,” Puri said.

At its inception, the LQD8360 began as “a very small sandbox” that grew larger, he said. “And what they said is they’d like a very large sandbox so that they can tune some of their AI algorithms to see the maximum performance they could get. And we told them, ‘Hey, if you're looking for a sandbox that can deliver that, we'll compose one for you.”

Liqid and Dell worked on tweaking the PowerEdge BIOS to support multiple GPUs.

“Think about it – in a 1-U pizza box Dell never had a reason to support 20 GPUs in that BIOS, because you couldn't put more than one, right? So we worked with Dell and we got the BIOS to be able to recognize dozens of GPUs. And then we went back to Orange … to work with their AI engineers to tune things like CUDA, like Tensor, to tune applications, like ResNet to see how much performance we can get out of it.”

The tweaking has continued, and Puri told us that as of this week, Orange squeezed another 5 percent of performance out of the system.