Advanced Computing in the Age of AI | Friday, April 19, 2024

Cisco Expands MDS SAN Switches For Hyperscale 

Hyperscale datacenters have their clustered object storage and supercomputing centers have their GPFS and Lustre parallel file systems, but at large enterprises, the storage area network still reigns as the place to put the most critical data. To that end, Cisco systems is expanding its MDS 9000 family of SAN switches, goosing them to span what Cisco is calling cloud-scale deployments.

The MDS 9000 SAN switch product line is unique among those created by Cisco in that it is not sold directly by the company or pushed through its channel partners, but sold on an OEM basis to the major storage array makers. This includes EMC, Hitachi, Hewlett-Packard, IBM, and NetApp. The SAN switches are also featured in the converged systems based on Cisco's UCS blade servers, including vBlock from VCE, VSPEX from EMC, UCP from Hitachi, FlexPod from NetApp, and SmartStack from Nimble Storage.

Cisco's main competitors in SAN switches are Brocade and QLogic, and like these two, Cisco makes its own ASICs to do the switching. Cisco's chips support both straight-up Fibre Channel and the Fibre Channel over Ethernet (FCoE) protocol. In Cisco's case, the new MDS 9148S SAN switch is based on the "Viper" ASIC, while the linecards in the new modular MDS 9700s are based on the "F Class" ASICs, which take their names from the fact that it is a Fibre Channel chip and F is the designation of fighter jets for the US Air Force. Brocade, which is Cisco's main competitor in SAN switches, has two ASICs of its own – Gen 5, which supports 16 Gb/sec links, and Gen 6, which it is developing in conjunction with host bus adapter maker Emulex to push Fibre Channel up to 32 Gb/sec. QLogic has the 2500 Series at the low-end and the 8300 Series at the high end.

cisco-lan-san

To give you a sense of how pervasive SANs are, Cisco has over 20,000 customers using its MDS switches and over 125,000 of the machines are in the field, according to Prashant Jain, product manager for the Data Center Business Unit at Cisco. Last year, Cisco launched the MDS 9710 multi-layer director SAN switch and the MDS 9250i multi-services switch for SANs, and this year it is rounding out the product line with a smaller modular director switch, a new line card for the directors, and a smaller fabric switch. These products sit alongside the Nexus family of converged LAN/SAN switches, which support both Ethernet and FCoE traffic. Interestingly, with the latest line card, Cisco is supporting FCoE on the MDS 9706 and 9710 modular SAN switches, which allows for the overlay of SAN traffic onto clouds.

The MDS 9148S is a 1U SAN switch that has from one to four modules in it and is intended to be used as a top-of-rack switch for SAN connectivity. The machine can support Fibre Channel ports running at 2 Gb/sec, 4 Gb/sec, 8 Gb/sec, and 16 Gb/sec speeds at line rate. Ports are added in blocks of a dozen, up to a maximum of 48 ports. The 9148S can support connections to up to 32 different virtual SANs (VSANs), and that term is different from what VMware is calling a virtual SAN. In this case, it means consolidating multiple physical SANs onto a single network but maintaining the logical boundaries between them so they all look like separate SANs to applications. The switch also supports Inter-VSAN routing, or IVR, which allows for resources to be shared across the virtual SANs.

Here is how Jain says the MDS 9148S will stack up to the competition from Brocade:

cisco-mds-9148s-vs-brocade

The Brocade 6505 comes in configurations with 12 or 24 ports at speeds up to 16 Gb/sec, and the Brocade 6510 comes with from 24 to 48 ports. Both are based on Brocade's Gen 5 ASIC.

Jain says that one of the key features of the MDS 9148S switch is that it has provisioning features that assign IP addresses and gateways to the switches and eliminates the need to manually configure each switch. This is particularly important in scale-out scenarios where a pair of these devices will be used to provide access to SANs for each rack of machines.

The MDS 9706 is a cut-down version of the multi-layer director switch that Cisco put out last year, and as the name suggests, the chassis has room for six line cards and it can scale up to a total of 192 ports running at either 16 Gb/sec Fibre Channel or 10 Gb/sec FCoE. The added memory and processing inside the chassis as well as tweaks to NX-OS give the MDS 9706 fifteen times the performance of the existing MDS 9506 director switch, says Jain, and at 1.5 Tb/sec per slot that is also three times the performance of any other compact Fibre Channel director switch on the market. The MDS 9706 comes in a 9U chassis and delivers 12 Tb/sec of aggregate bandwidth across its slots. It has redundant supervisors and independently scalable fabric modules.

In addition to the new chassis, Cisco is rolling out a new line card, which packs 48 ports running at 10 Gb/sec and supporting the FCoE protocol, and Jain says this is the highest density FCoE module that slides into a Fibre Channel director switch. One of the deployment scenarios that Cisco is pitching is having separate LAN and SAN core switches, preserving their unique networks, but overlaying them with the FCoE module to allow for a multi-hop linkage between servers using Nexus switching at the rack and core with SANs that have MDS 9700 at the core. Like this:

cisco-mds-multihop

One early adopter of the multi-hop FCoE capability is the Defense, Space, and Security Division of Boeing, which uses this capability on its private cloud.

Cisco is also showing off some new capabilities that are inherent in the switches and NX-OS, with a focus on scaling up SANs so they are more appropriate for cloud-scale environments. Jain says that Cisco understands that the ever-expanding number of virtual machines on server hosts means that its SAN switches need to be able to support a larger number of fabric logins, zones, and domains. And so that is what it has done for both the MDS 9700 and Nexus 7700 series of switches, as below:

cisco-mds-nexus-scale

Some of this is made possible through extra hardware in the switch. So, for instance, the new MDS 9700s have five times the compute capacity (through a combination of higher core counts and higher clock speeds) as well as having 8 GB of memory – four times the prior MDS 9500 series – to run software functions. Cisco does not like to be precise about what is inside the guts of its switches, for competitive reasons.

The new MDS switches also come with hardware-based Fibre Channel congestion detection and recovery, which can detect slowdowns in the storage switch in the MDS 9148S, 9700, and 9250i that are as small as 1 millisecond and can take action within nanoseconds to start or stop a process to try to rectify whatever issue was causing a packet drain on the storage network. It could be a speed mismatch between the server and the switch, a misbehaving host bus adapter, application or operating system performance issues on the host, or a virtual machine exit that gets clunky. The MDS prior generation 9500 and 9148 had this congestion detection and recovery functionality built into software, and its detection granularity was 100 milliseconds with a recovery action latency of 100 milliseconds. Obviously, implementing it in hardware has sped it up by two orders of magnitude.

In addition, Cisco is showing off an Ethernet fabric architecture that has an FCoE overlay, which runs 10 Gb/sec links back to the servers from leaf switches (Nexus 5000 and 6000 switches) and builds a spine out of 40 Gb/sec Ethernet switches (Nexus 5000, 6000, or 7000 series). This FCoE overlay capability does exchange-based load balancing in the fabric and it is set up in such a way that the A and B redundant links to the SAN are not interrupted when one of the spine switches fails (which would be a very bad thing). The FCoE overlay uses virtual LANs (VLANs) to maintain logical separation and redundancy for the SAN-A and SAN-B links, and it is able to dynamically discover leaf nodes and establish port relationships, thus reducing the possibility of human error during setup. This FCoE overlay function is available now in Nexus 5000 and 6000 switches and will be ready in the third quarter for the Nexus 7000 switches.

EnterpriseAI