Advanced Computing in the Age of AI | Tuesday, July 23, 2024

RDMA Fabrics – Unlocking Enterprise Data Analytics 

Ultra-low latency requirements in the High Performance Computing (HPC) industry continue to drive interconnect considerations for the world’s most powerful supercomputers. In the enterprise world, it’s also apparent that the speed paradigm in networking and application performance is shifting rapidly. Improvements in the CPU’s computational horsepower have reduced analysis runtimes, thus broadening the scope of analyses. In parallel, the growth spurred by web-text, social media and demand for real-time analytics has sent companies scrambling to find efficient solutions that can process big data without straining their budgets.

With limited scale-up options due to budget limits, inflow of multi-dimensional and multi-source data in next-gen applications is quickly pushing mid-range servers to hit performance limits. This gives rise to the need for cost-effective shared memory communication between servers. The key metric here is to avoid excessive latency, CPU overhead and network bandwidth drain due to apps needing massive data swaps and copies for multi-node operations.

Increased adoption of flash-based storage, machine learning and big-data analytics seem to suggest that mainstream enterprise is embracing this influx of data with the end goal of developing a predictive and prescriptive usage model to assist in efficient decision making[i]. To the point, Accenture Institute for High Performance recently completed a study suggesting that 40 percent of surveyed companies have implemented machine learning, with two out of five companies already utilizing this intelligence in assisting sales/marketing decisions[ii]. The rise in adoption of Hadoop and Spark also highlights this trend and underscores the importance commodity clusters will play in the future.

The financial industry voices a similar story where high frequency trading accounts for over 50 percent of global trades[iii]. The promise of fully-automated, high-speed trade executions have encouraged financial firms, both big and small, to continue investments in colocation data center services offered by various exchanges where latencies in the range of nanoseconds are at play. Today’s markets are complex, and it’s hardly a surprise that Wall Street is taking a hard look at shared memory systems because data movement, in the form of swaps and copies, can introduce latencies[iv] that can potentially cost millions.

These trends will accelerate the adoption of technologies, including RDMA network protocols such as IB and RDMA over Converged Ethernet (RoCE), in enterprise deployments. As technologies such as Non-Volatile Memory Express (NVMe) and NVMe-over Fabrics (NVMe-oF) support mature, the gap between On Line Transaction Processing (OLTP) and traditional data warehousing is bound to shrink. In fact, several software and platform suppliers, such as IBM and Microsoft, are introducing newer in-memory databases for real-time analytics that are already pushing infrastructure to the edge.

In such applications, RDMA networking, in the form of RoCE-enabled Ethernet NICs with advanced offloads, will not only minimize latency but also free up CPUs to increase overall processing power. Investments are expected to continue in the financial markets too, as trading firms with limited budgets are willing to sacrifice speed to save on jitter[v]. In this case, RoCE enabled NICs are again a perfect fit for the infrastructure as they can minimize jitter due to inbuilt error correction mechanisms, such as Priority Flow Control and Explicit Congestion Notification.

Although the need for shared memory systems in the enterprise is accelerating and gaining momentum, as some examples illustrate above, it’s hardly a surprise to find both mature and new solutions that have already been using RDMA-enabled networking. In fact some of those solutions have existed for years, such as Oracle’s Exadata, which is a high performance database appliance introduced back in 2008 that uses InfiniBand as its interconnect. It’s worth noting that the acceptance of Exadata in the market was extremely positive. Its fast adoption not only propelled it to become Oracle’s most successful product but also encouraged other leading database providers to add RDMA support to their scale-out database solutions. Among them were IBM DB2 pureScale and TeraData.

RDMA-enabled networks have also become “popular” in new flash-based storage systems. In such deployments, RDMA technology can take advantage of the faster speeds, lower latency, and lower jitter that SSD technology delivers. For example, Microsoft in its Windows Server 2012 R2 release introduced the SMB Direct feature, which adds support for SMB 3.0 over RDMA. This feature serves as the backbone for their new storage solution, called Storage Spaces Direct, which directly relies on RDMA networking to cut the cost of 1GB storage in half.

The value that RDMA brings to new technology trends continues to grow as evident with Hyperconverged systems and NVMe-oF systems adopting RoCE as a key interconnect.

Motti Beck is senior director, enterprise development, at Mellanox Technologies. Carl Tung is senior product line manager at Broadcom.