AMD Details “Seattle” ARM Server Chip
The annual Hot Chips conference is underway this week in Silicon Valley, and there is not much on the enterprise front on day one. Japanese supercomputer makers NEC and Fujitsu are both showing off their next-generation SX-ACE and Sparc64-XIfx processors aimed at, respectively, massively parallel vector and scalar supercomputing workloads, and DE Shaw is talking about its Anton 2 next-generation specialized ASIC for molecular dynamics simulations. None of these are likely to be commercialized for anything but strictly technical workloads any time soon, but that is not the case with AMD's "Seattle" Opteron ARM processor, which was also unveiled today.
AMD has given out some of the feeds and speeds of the Opteron A1100 processor, its first ARM part and one that is based on the Cortex-A57 core from ARM Holdings to speed up the time to market. The Opteron A1100 processor, as this 64-bit ARM chip is called, started sampling back in March and is slated to start shipping in volume during the fourth quarter of this year. As EnterpriseTech revealed back in May, AMD has become a full ARMv8 licensee and will next year create its own modified, low-power variant of the Cortex-A57 core to be used in a follow-on generation of ARM server chips. These next-generation ARM chips, expected in 2015, are being developed under Project SkyBridge, which will create identical pinouts for ARM and X86 variants of the Opteron chips so companies can use them interchangeably on system board. Further down the road in 2016, AMD will create its own ARM core, code-named "K12." Significantly, Jim Keller is leading the SkyBridge effort, which increases the odds for its success. Keller, who is chief cores architect at AMD, worked on Digital Equipment Corp's Alpha RISC chips and AMD's original Opteron processors, and this year he came back to AMD after a few years designing chips at Apple. Keller said that having to design ARM cores allows AMD to leverage its X86 experience, but also provides a feedback loop that will allow it to make X86 chips better, too.
The presentations at Hot Chips tend to be a little closer to current timeframes, so the company was not talking up the long-term roadmap. AMD chip designer Sean White did, however, reveal some details about the upcoming Seattle processor.
Given the more ambitious roadmap that AMD has, it is reasonable to ask why AMD started with the relatively Seattle design, which takes the A57 cores as-is from ARM Holdings and wraps networking, peripheral controllers, and accelerators around those cores. The first answer to that question is that you have to start somewhere. And the second answer, as White put it in his presentation, is that a significant number of workloads in the datacenter have low instructions per clock and high cache miss rates, and in these cases, having smaller cores and caches means they can deliver equivalent performance to chips with larger cores and caches provided the avoid the cache misses. An ARM chip can, in theory, do this on a smaller chip that burns less electricity than an Opteron, Xeon, Itanium, Power, or Sparc alternative.
The Seattle chips are being manufactured by GlobalFoundries, which is an amalgam of a number of different chip makers (including AMD's fabs in Germany and the United States) put together by the government of Abu Dhabi. The chip is made using the 28 nanometer processes from GlobalFoundries, which is the popular node at this point from both TSMC and GlobalFoundries for server processors. That is set to change soon as they both try to get down to 20 nanometers in 2015.
The ARM cores are paired up with a shared 1 MB L2 cache for every pair, with an 8 MB L3 cache shared across all eight cores on the die. Each core in a pair has 48 KB of L1 instruction cache and 32 KB of L1 data cache. The eight-core Seattle chip has two memory channels, which support up to two memory sticks per channel running at a top speed of 1.87 GHz. The memory controllers support either DDR3 or DDR4 memory, which respectively run at a top speed of 2.1 GHz or 3.2 GHz by the specs but which is probably overkill in terms of performance and heat generation for the Seattle chip, which is clearly aimed at microserver uses initially where heat dissipation is as big of an issue as performance. Main memory tops out at 128 GB across four memory slots using 32 GB sticks. This memory is not cheap, but the added capacity will probably allow for some interesting uses where applications are memory bound or I/O bound and not compute bound. (Memcached caches come immediately to mind.) The memory controller runs at one-quarter of the speed of the DRAM in the system, so 400 MHz for 1.6 GHz DDR3 memory, for instance, and the channels coming off the processor complex are interleaved. The memory has single-bit error correct and double-bit error detect scrubbing on the memory, and the single-bit errors are corrected in-line before the data is fed out of main memory to the requesting cache.
The six System Memory Management Unit (SMMU) controllers on the Seattle chip and sitting on the ring interconnect that links the cores and caches have all the hooks to support virtualization by the KVM and Xen hypervisors for compute, memory, and I/O, and there is a System Control Processor, which is based on a 32-bit Cortex-5 core, which is used to control the power, configure the server, control the boot process, and act as a service processor for various system management functions and it has its own RAM, ROM, and Ethernet port. The cryptographic processor on the Seattle chip hangs off the System Control Processor and has a hardware-based random number generator, Zlib compression and decompression, RSA, ECC, and AES encryption and decryption, and the SHA secure hashing algorithm all embedded on it. The Cortex-A57 cores can access this cryptographic unit.
This being a system-on-chip design, the Seattle die has lots of goodies added in. For connectivity to the outside world, the chip has two 10 Gb/sec (10GBASE-KR) Ethernet ports coming right off the die as well as a single 1 Gb/sec (RGMII) port for system management. This integrated networking will save power and cost in the systems that are based on the Seattle processor. (It is not clear if AMD will support RDMA over Converged Ethernet, or RoCE, on its 10 Gb/sec ports, but Applied Micro is going to do that with its next-generation X-Gene-2 ARM server chips.) The chip has eight SATA3 ports running at 6 Gb/sec and an integrated PCI-Express 3.0 controller with a total of eight lanes. You can carve it up into two x4s, on x4 and two x2s, or one x8, depending on what the system needs.
The Seattle reference board is based on a MicroATX form factor, and has four memory slots and one PCI-Express x8 slot. All eight SATA3 ports are available (which means you can put one disk per core, like Hadoop and other analytics workloads like) plus the two 10 Gb/sec Ethernet ports for clustering. AMD's reference system, shown above, comes in a 2U rack-mounted chassis with room for eight 3.5-inch SATA drives. It is not clear when AMD will put the Seattle chip into its SeaMicro microservers, but this would be a logical thing to do. AMD may wait until its next-gen ARM64v8 parts, which have not yet been given a code-name, to do that, however. We'll see.