Advanced Computing in the Age of AI | Wednesday, December 6, 2023

Broadcom Fights Off Ethernet Rivals With Tomahawk Chips 

Chip maker Broadcom is the dominant supplier of chips for Ethernet switches with its Trident and Dune family of ASICs. But it faces increasing competition from established players like merchant chip makers Intel and Marvell and switch makers that still do their own ASICs such as Cisco Systems and Hewlett-Packard. And then there are upstarts like XPliant, which just came out of stealth with its CNX family of Ethernet ASICs aimed at the emerging 25 Gb/sec and 100 Gb/sec markets in massively scaled datacenters.

Broadcom is aware of all of these threats as well as the desire by hyperscale datacenter operators to get faster networking coming out of servers and was, in fact, one of the five founders of the 25G Ethernet standard that the IEEE has finally gotten behind now that Broadcom, Google, Microsoft, Mellanox Technologies, and Arista Networks demonstrated that they were perfectly willing to go it alone and create their own standard if it did not.


Rochan Sankar, director of product marketing at the company, says that the ever-increasing core counts on servers as well as storage arrays based on servers are choking 10 Gb/sec ports coming out of servers. Moreover, with the network representing somewhere between 10 and 15 percent of the total cost of a large datacenter, enterprises, hyperscale operators, and cloud providers are all looking for a better mix of price and performance out of their networks. The back-of-the-envelope math that the 25G Ethernet adherents are doing is that a 25 Gb/sec port will have 2.5 times the bandwidth of a 10 Gb/sec port, cost about 1.5 times as much, and burn about half the power and therefore allow for much higher port density.

There are other issues that large scale datacenters are coping with. One is what Sankar calls "radix limited networks," which is just a fancy way of saying that the networks can't link as many devices as companies would like. Customers are also looking for network-wide analytics to better understand the behavior of their nets so congestion doesn't kill applications. You can have all the compute and storage in the world, but if you don't have enough bandwidth for modern, highly distributed workloads then the net chokes the apps. The idea is that you might only be spending 10 or 15 percent of the datacenter cost on networking, but it may have a disproportionate effect on the overall performance of the applications hosted in that datacenter.


To address these issues, Broadcom has created a new Ethernet ASIC, nicknamed "Tomahawk" and to be sold under the StrataXGS brand. (Broadcom likes to use the names of nuclear missiles in the United States stockpile for its key ASICs.) The Trident chip had a maximum of 128 SerDes circuits running at 10 GHz, aligning it to 10 Gb/sec and 40 Gb/sec switches, while the Tomahawk uses a new SerDes that runs at 25 GHz and which is aligned to 25 Gb/sec, 50 Gb/sec, and 100 Gb/sec traffic. This new SerDes, which is code-named "Long Reach," is optimized for bandwidth, high radix scalability, and low latency. Sankar tells EnterpriseTech that the Tomahawk chip can support a 400 nanosecond latency on a port-to-port hop and has 680 Mb for packet buffering. The Tomahawk chip, which is called the BCM56960, is a single device with over 7 billion transistors, topping the 5.57 billion transistors that the new 18-core Xeon E5-2600 v3 processor from Intel has etched on its die. (We have come a long way from a decade ago, when it took nine chips for Broadcom to deliver an eight-port 10 Gb/sec switch.) The Tomahawk ASIC can drive 3.2 Tb/sec at full duplex, which is almost three times as much oomph as the Trident II ASIC that Broadcom launched in August 2012. (Trident can push 1.28 Tb/sec.)


The important thing about switches based on the Tomahawk ASIC is that they will slip right into the existing cabling and deliver lots of benefits for links to compute and storage as well as across fabrics. The "God box" version of a Tomahawk switch is a 100 Gb/sec device that can have up to 32 ports, according to Sankar. Such as device will have 15 times the backbone capacity of a current three-tier leaf/spine network. The question, of course, is what will such a high-capacity fabric cost and when will it be available.Other expected configurations are switches with 64 ports running at 40 Gb/sec or 50 Gb/sec or with 128 ports running at 25 Gb/sec. The chip supports RoCE and RoCE v2, the Remote Direct Memory Access (RDMA) technology that was lifted from InfiniBand and woven into Ethernet. The ASIC also supports the usual suspects in terms of overlays and tunnels, including the VXLAN, NVGRE, MPLS, and SPB protocols.

The Tomahawk ASIC is just back from the fab and in the lab now, says Sankar, and the top hyperscale and cloud customers as well as the big OEMs who buy Broadcom silicon are testing it now. The ramp for Tomahawk is next year, and Sankar cannot be more specific than to say that. This is roughly on the same scale as what XPliant is delivering, and other network ASIC makers will be coming out of the woodwork soon to support the 25G Ethernet effort as well as to make a new line of 50 Gb/sec and 100 Gb/sec chips.

In addition to the new hardware, Broadcom is rolling out some new features to make its switches more malleable, just as XPliant was talking about doing with its ASIC launch last week.


The software stack that comes with the Tomahawk chips includes Broadview instrumentation, which does packet tracing and has visibility into the flow of packets around the fabric. The software has a various congestion analytics routines built in, including hashing and load balancing monitors, elephant flow identification (meaning when big wonking hunks of data rip over the fabric, disrupting things like a big rock thrown into a pond), buffer states, timing of packets, and headroom in network bandwidth across the fabric. If Broadview monitors and analyzes what is going on in the network, the FlexGS engine in the Broadcom software stack is what takes this data and optimizes traffic flows across the dataplane, automating traffic shaping to get the best performance out of that network.

This reconfigurability and automation is arguably more important than faster ASICs. In fact, this is precisely the kind of network software that the Googles and Facebooks of the world have been creating for their own whitebox switches for some time now because it has not existed in a commercial product. This software stack will have hooks into various SDN stacks, into cloud controllers like OpenStack, automation tools like Chef and Puppet, and the network operating systems that run on Broadcom silicon. What remains unknown is just how similar Tomahawk is to Trident and how much work it will be to move over network operating systems to the newest ASIC. If that doesn't take too long, we could see 25 Gb/sec and 50 Gb/sec switches before too long. And then the network interface adapter makers had better have their 25 Gb/sec adapter cards ready.