HP Offers Exclusive Peek Inside Impending Moonshot Servers
Hewlett-Packard has staked a lot on its hyperscale Moonshot platform and has said from the beginning that it wanted to have a mix of processors and coprocessors so these machines could be aimed at a wide variety of workloads.
The company is getting ready to launch a bunch of new server nodes for Moonshot in a few weeks. Ed Turkel, marketing manager for both the Hyperscale Business Unit and for the company's HPC efforts in general, gave EnterpriseTech a sneak peek at several of them at the SC13 supercomputing conference last week in Denver.
At the moment, HP has one server cartridge available in the Moonshot system, and it is based on the dual-core "Centerton" Atom S1200 launched by Intel this time last year. The server node was called the ProLiant Moonshot Server and was not given a proper numerical designation as far as we know.
The "Gemini" Moonshot 1500 enclosure can have as many as 45 server nodes – what HP calls cartridges – in its 4.3U high enclosure. The server cartridges snap in from the top, as do two Ethernet switch modules for linking server nodes in the enclosure to the outside world. The backplane in the Moonshot chassis allows for the server cartridges to be linked to each other in a 2D torus topology without the need an internal switch. This backplane is also used to link server cartridges to storage cartridges, which come with disk or flash drives; it has 7.2 Tb/sec of bandwidth, which is plenty enough for internode connections as well as to storage. This 2D torus is used to link three nodes in a north-south configuration (like an n-tier application) or fifteen nodes in an east-west configuration (like a more traditional parallel cluster or cloud).
Turkel tells EnterpriseTech that a Moonshot server cartridge using the eight-core "Avoton" Atom C2000 that debuted in September of this year from Intel is in early release with selected customers. This node is called the m300 and it will be launched formally on December 9. Dynamic Web serving – meaning an interpreted programming language running on a Web application server hitting a relational database – is expected to be the initial workload for the m300. The Atom C2000 is significant in that it has an on-chip Ethernet controller, but exactly how this will be used in the Moonshot chassis has not yet explained. (Some details have to be saved for launch day, after all.)
The current Avoton server cartridge has one processor on the board with a disk drive, but HP and Intel are working on a version of the card with four Avoton processors as well.
The next new server node coming to the Moonshot chassis will be called the m700, and it is based on AMD's "Kyoto" Opteron X2150, which came out in May. The Kyoto chip is what AMD calls an Accelerated Processing Unit, or APU, which means it puts a CPU and a GPU on the same die and links them through a high-speed bus. In this case, the chip has four "Jaguar" cores on the CPU side with 2 MB of L2 cache and 32 GB of main memory, plus a Radeon HD 8000 graphics chip with 128 cores. Interestingly, those GPU cores run at 600 MHz and provide 154 gigaflops of single-precision math; they only deliver 9.3 gigaflops at double precision, which is not great at all. But that single-precision math, coupled with OpenCL offloading, could be very attractive for certain kinds of number-crunching workloads.
Turkel says that HP and AMD are initially thinking the m700 node will be used for hosting physical desktops in the datacenter, one chip per one physical desktop. The m700 card has four Kyoto chips on a cartridge, so you can host up to 180 physical desktops from a single chassis. As for the m300 being used for hybrid calculations, Turkel said that HP "sees this as a potential" and that HP was at the beginning stages with the Moonshot machine.
"We have target workloads for each one of these," says Turkel. "But as I like to keep pointing out, customers will do things that we did not necessarily anticipate. We are positioning this one for hosted desktops, but customers will take it in whatever direction they will."
The m800 server cartridge, which is based on the KeyStone-II hybrid ARM-DSP chip designed by Texas Instruments, is probably going to see some computational work in the enterprise, too. The KeyStone-II chips take a Cortex-A15 processor from ARM Holdings. This design can have two or four cores that supports the 40-bit Large Physical Address Extensions for the 32-bit processor. The cores run at 1.4 GHz. An on-chip TeraNet coherency network links the ARM cores to as many as eight TMS320C66x digital signal processors on the same die. The DSPs run at 1.2 GHz and provide about 1 teraflops at single-precision and 384 gigaflops at double precision, which is pretty respectable. The KeyStone-II also has security and packet processing accelerators and an integrated Ethernet switch.
The m800 will have four of these KeyStone-II chips on a cartridge, so that works out to a maximum of 180 teraflops per chassis and 1.8 petaflops per rack. (The rack is a little taller than standard 42U size.) That is a fairly large amount of computational capability in a rack, of course, and there are plenty of use cases.
Server cartridges are also in the works using Calxeda's "Midway" ECX-2000 processor and Applied Micro Circuits' X-Gene. Both of these are multi-core ARM processors.
"HPC has a long history of applications using digital signal processors," says Turkel. "The first one that we talk about for the m800 is telecom, specifically for voice over IP, which is kind of a no-brainer for DSPs. "We have got oil and gas companies who are investigating using the m800 cartridge as a possible target for seismic analysis. It is very much investigatory. And there are other obvious folks who do a lot of digital signal processing, some of whom we can't talk about."
Turkel says that one of the areas where Moonshot is seeing interest is in embarrassingly parallel analytics applications. Financial services firms are very interested in the Avoton cartridges in particular for Monte Carlo simulations and other kinds of analytics. The reason is simple: that eight-core chip has about the same performance as a "Westmere" class Xeon chip for these kinds of jobs.
"If I can get to 1,800 processors in a rack, each simulation may not run as fast as it would on a traditional platform, but the aggregate throughput you will get – particularly on a per dollar, per watt or per square foot standpoint – is going to be a lot better. A lot of financial services firms have boxes, and they are in evaluation mode."
Another use is for login and management servers that front-end the compute nodes in a traditional cluster. Turkel says that one customer with a 2,000-node cluster has around 100 nodes that perform this function. Those machines should be really doing computation, and this unnamed customer is looking at using Moonshot nodes to handle job setup and submission for the cluster, allowing for the 100 beefier server nodes to be reallocated to run simulations.
HP is not saying yet what its plans are for upgrading the networking in the Moonshot chassis, but Turkel confirmed that over the long haul, HP will upgrade the integrated switches to 10 Gb/sec speeds. The company is looking at the possibility of adding InfiniBand switching as well, but has made no commitments as yet. It is also possible for HP to extend the internal 2D torus beyond a single chassis, but HP had no comment on this.