Advanced Computing in the Age of AI | Monday, August 8, 2022

Moonshot Scale Leveraged For Transcoding, Web Infrastructure 

Hewlett-Packard is ramping up its Moonshot hyperscale systems with a set of new server nodes and specific workloads in the enterprise that it has tuned up for the hardware. Significantly, HP is shipping its first Xeon-based server cartridge for the Moonshot machines, marking the first time a relatively brawny system is available for heavier workloads.

Last month, HP launched the first Moonshot server cartridges to sport a 64-bit ARM processor, in this case an eight-core "Storm" X-Gene1 chip from Applied Micro, targeting the device at hyperscale workloads such as Memcached web caching initially. The new Moonshot m710 server cartridge is roughly speaking probably in the same performance class, and it packs a four-core "Haswell" Xeon E3 processor onto the cartridge. HP is using the Xeon E3-1248L v3 chip, to be specific, which has a 1.8 GHz clock speed (with a 3.2 GHz Turbo Boost) and which has an on-chip Iris Pro P5200 graphics processor. HP is making use of that graphics chip as a coprocessor for the CPU cores and the first workload to be accelerated by that GPU is video encoding and decoding. HP did not have floating point processing ratings for the Iris Pro P5200 GPU portion of the Haswell chip as EnterpriseTech went to press.

The m710 cartridge is also being used to host Citrix Systems XenApp virtualized applications, which allow for virtual desktops to share the compute and GPUs rather than have a dedicated socket for each end user, as HP has already done with its m700 cartridge, which has four of AMD's "Kyoto" Opteron X2150 processor on a single board. The Opteron-based m700 card can also be used to run hybrid CPU-GPU workloads, since the four-core Opteron X2150 chip has an integrated 128-core Radeon HD 8000 graphics chip that runs at 600 MHz. That Radeon GPU is rated at 154 gigaflops of on single-precision floating point math and 9.3 gigaflops at double precision.


The m710 cartridge has one processor and 32 GB of 1.6 GHz DDR3 low-voltage memory. The node can have 120 GB or 480 GB of flash storage in an m.2 module, and comes with a dual-port Connect-X3 network interface from Mellanox Technologies welded onto the card, which supports 10 Gb/sec links and, significantly, the RDMA over Converged Ethernet (RoCE) protocol for low-latency networking. The video encoding solution uses the Moonshot 45XGc 10 Gb/sec switch module, and the chassis has two 45-port switches to provide redundant links to each cartridge. Each one of those switches has four 40 Gb/sec uplinks coming out of the chassis to the outside world.


The Iris Pro P5200 has 128 MB of its own embedded DRAM cache, which can be used to hold datasets for hybrid processing very close to the GPU, according to Gerald Kleyn, director of hyperscale server hardware research and development at HP. That eDRAM can also be used as a kind of L4 cache for the processor cores.

Kleyn says that HP expects for customers to push the envelope and use the hybrid CPU-GPU computing enabled on selected Moonshot cards for lots of different workloads, but that for now, HP is really focusing on the Media Platform from Vantrix and VOS from Harmonic, two popular video encoding and decoding programs, with the m710. Companies that stream video have to encode video stream for the myriad formats of devices out there, and more and more this is being done in real time using fleets of transcoding servers. HP has been working with these two vendors for the past six months to tune up their transcoding applications to make use of the Iris Pro P5200 graphics chip to do transcoding work, and the results are pretty dramatic.

Using the m710 cartridge running Linux and the Vantrix transcoding software, a rack of Moonshot machines with 450 processors was able to handle 20X the number of transcoding streams compared to a rack of standard Xeon servers without GPU acceleration. (HP cites data from Frost & Sullivan from this past June that shows the industry average is 5.54 high definition, or HD, video streams per rack unit, and the Moonshot can deliver around 100 streams per rack unit.) And for a set number of transcoding streams, the Moonshot setup took up only 5 percent of the floor space as the X86 server racks and the cost per stream was chopped by 80 percent. That works out to delivering HD streams at the cost of lower-resolution SD streams. Granted, X86 servers goosed with GPU accelerators would show some remarkable gains, but this Moonshot setup is nonetheless also showing the kinds of density and price/performance that a new server architecture has to deliver if it hopes to get a foothold in the market.

Here is a table showing the transcoding rates for each m710 cartridge and a full Moonshot chassis running the Vantrix Media Platform software atop an OpenStack cloud with CentOS 6.5 and Intel's Media SDK libraries.


It would be interesting to know the floating point ratings of that Iris Pro P5200 graphics chip on the Haswell Xeon E3 die. We have reached out to both HP and Intel to get those numbers.

In addition to the m710 cartridge, HP is rolling out a long-expected Moonshot module that puts four of Intel's "Avoton" C2000 series Atom processors on a card. This m350 cartridge has four of the C2370 Atoms, which have eight cores and which run at 1.7 GHz with a 2 GHz Turbo Boost overdrive. The cartridge has four memory slots, each holding 8 GB for each one of the sockets, and has room for a 32 GB or 64 GB m.2 SSD module for each one of the four Atoms for local storage. The m350 has two 1 Gb/sec Ethernet ports coming off it, and the networking is that slow because HP is initially targeting this device at dedicated web hosting workloads where customers have modest compute and networking needs but who nonetheless want a whole processor allocated to them. For service providers, a rack of Moonshot systems can have 1,800 sockets, which is a pretty big number, and a total of 14,400 cores, which is also a large number. And that could also mean that the m350 finds other hyperscale uses. Using the top-of-the-line Haswell Xeon E5 chip, you can only get 160 sockets and 2,880 cores in a rack. Granted, those cores are a lot brawnier in the Xeon E5 than in the Atom C2370, but the aggregate work for massively parallel applications might not be different while the energy usage and pricing could very well be. The point is, do the math.

This is made somewhat difficult by the fact that HP has not released pricing information for the m710 and m350 cartridges as separate items, and ditto for the Moonshot switches and other options for the machine. HP has provided pricing for 15-cartridge starter kits. Canonical is bundling in a 1 year license of Ubuntu Server with the m350 setup, which has fifteen nodes with the m.2 I/O mezzanine card and four of the 32 GB m.2 SSD modules. This setup has a chassis, a single Moonshot 180P 1 Gb/sec Ethernet switch, one 4QSFP uplink kit, and three 1,500 watt power supplies. All told, this m350 starter configuration costs $85,372.

If you are looking for transcoding or VDI or to experiment with hybrid computing as outlined above, the Moonshot m710 starter kit has the chassis, the same three 1,500 watt power supplies, fifteen m710 cartridges, and the Moonshot 45XGc switch and 4QSFP uplink kit; it costs $55,147.

Both of the new Moonshot server cartridges support Microsoft Windows Server 2012 as well as the most recent releases of Red Hat Enterprise Linux, SUSE Linux Enterprise Server, and Canonical Ubuntu Server.

In addition to these new Moonshot nodes, HP is working with Canonical and Red Hat to create a multi-tiered, infrastructure-in-a-box configuration using its existing m300 Avoton-based cartridges. Kleyn says that many companies are looking to put all of the tiers of their applications inside of a Moonshot enclosure, including load balancers, Web servers, back-end database servers and NoSQL data stores, because the latencies between the nodes is very low on the Moonshot midplane. This was certainly the case with news aggregator InkaBinka, which told EnterpriseTech recently about how it moved off the cloud and into its own Moonshot hardware specifically to get predictable performance and to lower latencies between portions of its applications. For many customers, a node with a single eight-core Avoton C2750 running at 2.4 GHz with 32 GB of memory and a 1 TB drive or a 240 GB SSD will do the trick. So does plain old 1 Gb/sec Ethernet networking. HP says that a 45-node chassis can handle up to 115,000 concurrent clients running the DayTrader Java-based stock trading application atop a standard Linux stack with MySQL as the database and Apache as the web server. So that is well over 1 million users in a rack.

The web infrastructure in a box setup based on the m300 cartridges comes with special reduced-priced Linux software licenses from Red Hat for RHEL 6.5 and 7, which has adjusted the per-socket price for HP given the fact that an Avoton socket is not equivalent to a Xeon socket in terms of raw performance. (HP and Red Hat didn't say what that price reduction was.) It is not clear if Canonical will make the same deal on support contracts for Ubuntu Server, but it stands to reason it will in competitive situations and given that Canonical is focused like a laser on hyperscale business. Canonical has tuned up its Ubuntu Server 14.04 and 14.10 releases for the Moonshot hardware and for this web infrastructure stack in particular, with its Metal as a Service (MaaS) bare metal provisioning able to lay down Ubuntu Server on m300 nodes and its Juju software orchestration tool able to provision parts of the stack atop Linux. SUSE Linux Enterprise Server 11 and Microsoft Windows Server 2012 and 2012 R2 are also certified for this web-in-a-box.

This infrastructure starter setup has fifteen m300 cartridges, each with a 240 GB SATA SSD drive, the regular Moonshot 45G switch and 6SFP uplink kit plus three 1,500 watt power supplies. It costs $48,937.

The two new Moonshot m350 and m710 nodes and the web infrastructure bundle based on the m300 are available now.

One Response to Moonshot Scale Leveraged For Transcoding, Web Infrastructure

  1. Nic says:

    The 8GB of DDR3 per SOC on the m700 can be too limited for some users. Intelligent Memory has debuted an ECC SO-DIMMs with 16GB (per module – part# IMM2G72D3LSOD8AG) which have worked well on the m700 Cartridge in testing. Unfortunately there is not yet an official HP approval for them, but the modules are already in product and available.

Add a Comment