Advanced Computing in the Age of AI | Monday, June 24, 2024

ThunderX ARM Has Datacenter Muscle 

The third 64-bit ARM processor aimed at datacenter workloads has just entered the field now that Cavium Networks, a maker of multicore chips aimed at networking and other embedded workloads, has begun sampling its ThunderX chips to early customers making servers, networking, and storage equipment.

Cavium, which has expertise with the MIPS RISC architecture and which has created very sophisticated system-on-chip devices that it sells under the Octeon brand, launched into the ARM race back in June when it revealed that it was a full ARMv8 architecture licensee from ARM Holdings and was fast at work on the ThunderX line. This is not Cavium's first time around the block creating a many-core processor with NUMA clustering and lots of different accelerators and peripheral controllers all on the same device. The Octeon III 7XXX processors that it already sells have 48 cores on a die and scale across two sockets – the same feeds and speeds as the initial ThunderX processors. The process of creating the ThunderX line was a bit more complex than doing a global replace of "MIPS core" with "ARM core" in the Verilog hardware description language. But it was not like starting from scratch, either, and there are many architectural similarities, right down to the L1 and L2 caches and the four DDR3/DDR4 memory controllers on the die and myriad accelerators. The ThunderX chips will support up to 512 GB of main memory per socket, which is 33 percent more memory than Intel is supporting with its "Haswell" Xeon E5-2600 v3 server chips, but they only have 16 MB of L2 cache compared to the very deep and fat L2 and L3 caches of the Xeons.

The Octeon III CN7XXX chips are etched by GlobalFoundries using its 28 nanometer process, one that is well established and used for a variety of CPUs on the market. It includes up to 48 MIPS cores running at up to 2.5 GHz. The first generation of ThunderX ARM chips, technically known as the CN88XX-X family, will also use the same 28 nanometer wafer baking processes from GlobalFoundries, and it will also feature a maximum of 48 cores on the die and run at a target speed of 2.5 GHz.


Gopal Hegde, ‎vice president and general manager of the Server Processor Group at Cavium, tells EnterpriseTech that this high-end ThunderX chip, which will have from 24 to 48 cores, is aimed right at the volume, two-socket portion of the server market where Intel's Xeon E3 and Xeon E5 processors absolutely dominate the datacenter. Cavium is not revealing how many transistors are in this top-end ThunderX, but confirms that the chip will come in a square package that measures 52.5 millimeters on a side.

Cavium still has plans to launch a lower-end CN87XX-X variant of ThunderX chips that will have 8 to 16 cores on a die; this chip will also be etched in the same 28 nanometer processes from GlobalFoundries and will cover the part of the market where the low-end of the Xeon E3 and the high-end of the Atom server processors from Intel play. This chip will also compete with other 64-bit ARM server chips from the likes of AMD, with its "Seattle" Opteron A Series, and Applied Micro, with its "Storm" and "Shadowcat" X-Gene 1 and X-Gene 2 chips.

Looking ahead, Hegde said that Cavium was looking at implementing the ThunderX-2 processors in a 14 nanometer FinFET process, which would move to a 3D transistor design and radically shrink the transistor size at the same time. This shrink should allow for Cavium to put a lot more cores and other components on the die while at the same time ramping up the clock speed on the cores as well (if this is valuable to enough customers). Most likely, Cavium will play around with the cores and clocks to get a broader set of core counts and frequencies, addressing more of the server, storage, and switching markets.

The basic feeds and speeds of the two initial ThunderX processors were covered in our story back in June, but there are some new details that have come out this week. What we now know is that the high-end ThunderX chip will implement a full-on Layer 2/3 switch fabric using the networking chips from XPliant, which Cavium acquired in July of this year and which launched its initial chips in September. The top-end XPliant ASIC has 3.2 Tb/sec of aggregate switching bandwidth and can support 32 ports running at 100 Gb/sec, 64 ports running at 40 Gb/sec or 50 Gb/sec, or 128 ports running at 10 Gb/sec or 25 Gb/sec. The exact configuration of the networking on the ThunderX processors has not been revealed, except to say that they will each have multiple ports running at 10 Gb/sec or 40 Gb/sec speeds in the machines aimed at server workloads. This embedded Ethernet switch will obviate the need for a top-of-rack switch in server clusters.

Versions of the ThunderX chip aimed at network function virtualization (NFV) workloads as well as telecommunication, media, and gaming systems will have more I/O in general and more security accelerators and will also have 100 Gb/sec interfaces. Hegde is not ready to provide specifics, but based on early tests, he says that this variant of the ThunderX aimed at networks will be able to run NFV software better than a Haswell class Xeon from Intel. There is another ThunderX variant aimed at storage arrays, which has a little less compute and more peripheral ports and I/O, and yet version aimed at hyperscale web servers that is different from the compute-heavy variant. All four of these chips are sampling now.

"We are broadly engaged with these products," says Hedge, "and are seeing interest from all sorts of different customers, including some of the guys who you would not necessarily think would jump on this."

Hegde says that early adopters will probably get products based on the ThunderX chips into products in the second half of 2015, and that even more will follow in the fourth quarter of 2014 and the first quarter of 2016. That seems like a long time to wait, considering how long customers have waited so far for a volume, server-class, 64-bit ARM chip. At the moment, the ThunderX chips will run development releases of Linux from Red Hat and SUSE Linux as well as the production-grade of Canonical Ubuntu Server, and there is an outside chance that it might even run the future Windows Server 10 operating system – but don't expect Cavium to comment on that. It is far more likely that Microsoft will use ARM chips on its own internal Azure cloud for its own workloads before it offers a commercialized variant to third parties. If there is an advantage, you can bet Microsoft will want to leverage that advantage for itself first and not try to confuse the X86-based Windows market with another option. Microsoft has been mum on this Windows Server on ARM talk, except to say it would not happen on Windows Server 2012 or Windows Server 2012 R2.

The real question that everyone wants to answer, and no one has to anyone's satisfaction, is if an ARM SoC aimed at servers, storage, or networking can present a sustainable advantage over an X86 chip from Intel and sometimes AMD. The first ARM server chips launched three years ago to much fan-fare from Calxeda, but these 32-bit processors did not sell well and the company could not afford to make the leap to 64-bits. AMD has been quiet as a mouse about the Seattle chips, and Applied Micro is shipping X-Gene 1 to select customers but has not landed any big deals it can talk about among commercial customers.

"We will certainly have TCO benefits," Hegde tells EnterpriseTech. "Technologies go through this hype phase, and then they enter the real world. For ARM servers, a lot of things are actually lining up."

One of the challenges for the early ARM server chips is that they were relegated to the microserver segment and not available in workhorse two-socket servers that are the most common engine in the datacenter. Cavium is starting out with two-socket machines and working its way down. The ThunderX chips will be available in a single-socket ATX form factor and a half-width sled that puts four server nodes in a 2U chassis, as is common in many datacenters today. Equally importantly, the baseboard management controller in Cavium's systems is external to the ThunderX chip has the same interfaces as an X86 machine, and they use AMI Aptio BIOS chips to connect peripherals to the chip. The way the system boots and the way the system is managed is identical to the way it is done on X86 systems. The software ecosystem is developing fast – and about as fast as you could expect without a lot of 64-bit hardware – and will mature through 2015 as more iron becomes available.


Hegde also said that the ThunderX chips would have thermal advantages, and speaking very generally said that an eight-core ThunderX would consume about 20 watts and a 48-core part would consume about 95 watts. This includes the processing and networking and the chipset to do NUMA clustering is also built onto the die. The ThunderX, Hegde said, would have a total thermal footprint that would be about 50 percent lower than a Xeon chip of equivalent to performance once you add in the NUMA and peripheral chipset and a LAN on motherboard. The lower power consumed means a server needs fewer fans, and that will lower the power consumption even further, he added.

The follow-on question to all of this, of course, is what will the Intel Xeon E3 and E5 chips of early 2016 look like as Intel pushes down the Moore's Law curve to the "Broadwell" Xeons and looks ahead to the "Skylake" Xeons. Those ThunderX advantages cited above could very easily be gone by the time that Cavium's partners get gear in production. That is what makes this a race.