Advanced Computing in the Age of AI | Thursday, March 28, 2024

Mellanox Weaves Tightly Into OpenStack, Adds Long Haul Switches 

Mellanox is making headway selling its networking hardware and software in both public and private clouds, as EnterpriseTech has previously reported. But the company has to make it easier to use its InfiniBand and Ethernet products with popular cloud software. To that end, Mellanox has worked with key parts of the OpenStack project to get its drivers and plug-ins mainstreamed so everything works together right out of the box.

The company is also showing off some of the performance benefits that come from using Remote Direct Memory Access for compute and storage on OpenStack-based clouds. And in a separate development, Mellanox is rolling out two new models of its MetroX TX Series long-haul switches for linking datacenters at metropolitan distances.

Eli Karpilovski, senior manager for cloud market development, tells EnterpriseTech that Mellanox has dozens of proofs of concept running now on OpenStack-based clouds, and that while these enterprise customers are concerned about performance, scalability, and features, what they are most concerned about is knowing that someone is supporting the entire stack.

So Mellanox has become a certified partner for the Neutron virtual networking interface and the Cinder block storage that are part of the project and that are integrated in the latest "Havana" release of OpenStack. (You can see our coverage of the Havana release here.) Mellanox is also the first server network adapter card maker that is certified to work with Red Hat Enterprise Linux OpenStack Platform, which is Red Hat's commercial distribution mixing Linux and OpenStack. Mellanox has not revealed the other certifications it will go after, but PistonCloud and Mirantis offer popular OpenStack distributions as well, and like Red Hat, Canonical is bundling OpenStack with its Ubuntu Server variant of the Linux operating system, too. All of these are logical for certification at some point. 

mellanox-openstack-integration-2

Those building clouds can choose either InfiniBand or Ethernet for their adapters and switches, and Karpilovski says that some deploy InfiniBand at the core of the cloud, linking servers to each other and to storage, while putting Ethernet at the edge linking out to the Internet. Others want Ethernet top to bottom and all the way out.

They are all looking for performance increases, whether they are opting for 10 Gb/sec and 40 Gb/sec Ethernet or 40 Gb/sec and 56 Gb/sec Ethernet adapters and switches. The ConnectX-3 Pro adapter cards, announced earlier this year, are able to offload the processing of VXLAN and NVGRE overlay maps from the hypervisor on the server to the card. This mapping of physical to virtual addresses on the network is not a big deal when you have four or five virtual machines on a server, says Karpilovski, but it is a big deal when you have 20 or 30 VMs on a system.

"In the virtualized environments, we are here to accelerate hypervisor performance," says Karpilovski. "We are offloading some of the tasks that the hypervisors are struggling with at large scale."

Recent benchmark tests show that this offload features of the ConnectX-3 adapters can double the throughput on a 10 Gb/sec Ethernet adapter card and cut the CPU overhead by around 40 percent running anywhere from one to sixteen VMs on the server.

These ConnectX-3 Pro adapters also sport an embedded switch and support Single Root I/O Virtualization (SR-IOV), the latter of which allows for a PCI-Express adapter top virtualize access to its underlying hardware and provide isolation for quality of service for virtual machines running on the server. In another benchmark test, the round table time (RTT) latency for a converged adapter supporting both storage and server traffic was improved by a factor of 900X, dropping to 11.9 microseconds, through the combination of this embedded virtual switch and support for SR-IOV.

Because storage is so important to cloud operators – particularly as they shift from local storage in their compute nodes to external object and block storage services running over the network and shift to solid state storage inside the server nodes to speed up compute and improve the performance of virtualization – Mellanox also ran a benchmark on OpenStack's Cinder block storage to show the effect of using Remote Direct Memory Access between compute servers and Cinder storage servers.

mellanox-rdma-cinder-performance

In this case, Mellanox is comparing Cinder storage that links using the iSCSI protocol over TCP/IP and routing it over the iSCSI Extensions with RDMA protocol, or iSER for short, that makes use of the RDMA feature to bypass the networking stack on both the server and the storage server. In those tests, as you can see, the performance of the Cinder block storage was boosted by more than a factor of 4X.

Karpilovski says that Mellanox is working to accelerate the Ceph object/block storage hybrid, which is also popularly used with OpenStack, so it runs out of the box. Ceph integration and certification will be done before the end of the year.

mellanox-metrox-tx6280

In an unrelated announcement, but one that is equally useful for large enterprises with multiple datacenters, Mellanox has announced two new MetroX long-haul switches that support either InfiniBand or Ethernet protocols.

The TX6240 comes in a 2U chassis and offers two long-haul ports that can run at 56 Gb/sec on InfiniBand and either 10 Gb/sec or 40 Gb/sec on Ethernet; it has two downlinks that run at either 56 Gb/sec with InfiniBand or 40 Gb/sec with Ethernet. This long-haul switch can link two datacenter that are as much as 40 kilometers apart, and delivers a maximum 80 Gb/sec of throughout. The latency between the inbounds and outbound is 700 nanoseconds with the TX6240, and then you have to add 5 nanoseconds for every meter you go over dark fiber in the metropolitan area you are spanning. That works out to 200 microseconds at 40 kilometers. The TX6280 is basically the same iron, but it has only one uplink and one downlink and it can push and pull data over an 80 kilometer distance.

Pricing for the new MetroX switches was not divulged. Presumably there is a slight premium for the longer distance, but not too much because the stretchier switch has half as many ports. Karpilovski was mum on this speculation when EnterpriseTech brought it up.

EnterpriseAI