Advanced Computing in the Age of AI | Thursday, March 28, 2024

HP Forges ConvergedSystems From Moonshots and ProLiants 

Hewlett-Packard is winding down 2013 with a bunch of systems announcements from its Discover partner and customer event in Barcelona, Spain. The new systems include a Moonshot configuration that the company previewed to EnterpriseTech a few weeks ago as well as some preconfigured machines designed to be used as private clouds or to run HP's Vertica parallel columnar database.

All of the new machines carry a new brand from HP: ConvergedSystems. The systems are an evolution of the AppSystem and CloudSystem pre-configured blade servers that HP has been peddling for the past several years. The difference this time around with ConvergedSystems is that the stacks of servers, storage, and networks are not limited to HP's BladeSystems blade servers. ConvergedSystems can be built from whatever appropriate hardware HP thinks is needed for the job, and in the case of the setups being unveiled at the Discover conference, the stacks include one that is based on the new Moonshot hyperscale machines,  two that are based on ProLiant rack servers, and one that is based on blades.

The other important thing about the ConvergedSystems aside from using a broader mix of technology is that HP will be pushing them predominantly through its channel partners. HP is using its channel to take on the Vblock converged systems from the Cisco Systems and EMC partnership called the Virtual Computing Environment and the FlexPod preconfigured stacks available from Cisco and NetApp.

Shooting for the Moon – and the Desktop

The ConvergedSystems 100 for HDI is the one based on the Moonshot system that Ed Turkel, marketing manager for the Hyperscale Business Unit and for the company's overall HPC efforts, told us about back at the SC13 supercomputing conference in Denver. While HP is launching the m700 server cartridge, based on a quad-core Opteron processor with an integrated Radeon graphics card, specifically for hosted desktops, the characteristics of that Opteron processor and the Moonshot's density will make it attractive for other kids of workloads – particularly those who need lots of single-precision math on a budget.

HDI is short for Hosted Desktop Infrastructure, and this is distinct from virtual desktop infrastructure in a number of ways. With VDI, you stack up some servers and use a hypervisor to slice servers up into virtual PCs. Then you use a VDI broker program, such as VMware View or Citrix Systems XenDesktop, to dispatch these PC slices over the network to endpoint devices of all kinds. HDI, by contract, offers a one-to-one relationship between a processor and a graphics chip on a server node and the end user at the other end of the wire.

The ConvergedSystem 100 is based on the m700 server cartridge for the Moonshot chassis, and it employs AMD's "Kyoto" Opteron X2150 processors, which launched in May. The Opteron X2150 has four "Jaguar" cores running at 1.5 GHz with 2 MB of L2 cache and can have up to 32 GB of DDR3 main memory attached to it. The Kyoto chip also has a Radeon HD 8000 graphics chip with 128 cores on the same die. (This CPU-GPU combination is what AMD calls an Accelerated Processing Unit, or APU.) The m700 server cartridge has four Kyoto chips on it, each with 8 GB of memory for this HDI setup and this is the only memory option available at this time. The CPUs are soldered to one side and the four SODIMM memory slots are on the other side. The m700 cartridge also has a 128 GB SATA flash drive that snaps into a mezzanine slot, which is carved up into four chunks with 32 GB allocated for each processor. John Gromala, senior director of hyperscale product management for the HP Server group, says that if hosted PCs need more storage, customers can hook up a NAS array to the Moonshot chassis and give users access to storage there.

The ConvergedSystem 100 is supposed to shoot the gap between a BladeSystem WS460 workstation blade, which has lots of CPU and GPU performance, and a virtual desktop based on hypervisors and servers, which generally do not have enough graphics oomph to satisfy end users. In fact, says Gromala, the Opteron X2150 chip has on the order of six times the graphics performance of a slice of a hypervisor on the typical server in a VDI setup these days. The ConvergedSystem 100 has on the order of 44 percent lower total cost of ownership compared to a real PC desktop and burns 63 percent less power per user, too. And it only costs a nominal bit more than a VDI slice.

The ConvergedSystem 100 is a single Moonshot enclosure and has two 180-port Ethernet switches running at 1 Gb/sec speeds plus 45 server cartridges, for a total of 180 hosted PCs. At the moment, XenDesktop from Citrix is the broker that has been certified on the setup and Microsoft's Windows 7, which runs natively on the Opteron X2150 chips, has been certified on the setup as well. The Moonshot 1500 chassis has three 1,500 watt power supplies; it comes in a non-standard 4.3U high size. Add it all up, and the ConvergedSystem 100 has a list price of $137,999. That works out to $767 per hosted desktop for the hardware, not including XenDesktop or Windows 7 or the thin or thick client at the other end of the network wire.

Using the Moonshot's non-standard 47U rack, you can get ten enclosures in a rack for a total of 1,800 nodes. This is a lot of computing in a relatively small space. As EnterpriseTech has previously pointed out, that Radeon HD 8000 graphics chip with 128 cores run at 600 MHz and deliver 154 gigaflops of single-precision floating point performance math but only 9.3 gigaflops at double precision. The Jaguar core itself can do some math, too. Specifically, the floating point unit has 128-bit processing and a 128-bit wide data path and it can do four single-precision multiplies and four single-precision adds at the same time. Alternatively, it can do one double-precision multiply and two double-precision additions per clock cycle. You can also double-pump the floating point unit and do one 256-bit AVX vector math operations per clock cycle.

Suffice it to say, with 1,800 Opteron X2150 processors in a rack, that works out to something on the order of 277 peak teraflops per rack at single precision from the GPU side of the APU and something on the order of 43 teraflops from the CPU side at single precision (assuming a balanced mix of multiplies and adds). Each one of those GPUs costs $35 at list price, so that is around $63,000 per rack of m700 servers. This works out to a mere $278 per teraflops at single precision. You have to code the offloading of work in OpenCL to use the GPU on the Kyoto chip in this manner, and you also have to pay something on the order of $1.4 million for a full rack of this CloudSystem 100 setup, which is definitely not cheap. The GPUs in the APUs are the inexpensive bit.

For fun, let's make a comparison. A Tesla K10 GPU coprocessor from Nvidia, which has two GK110 GPUs on a single card, is rated at an intentionally modest 190 gigaflops at double precision  but a very impressive 4.58 teraflops at single precision. Nvidia does not provide list pricing for its Tesla coprocessors, but a Quadro K5000 using one GK110 GPU that runs slightly slower than the one used on the K10 card costs $2,249. Call it $5,000 for the K10 card for a thought experiment. You can put three of these K10 GPU coprocessors into an SL250 tray server, which is a half-width two socket machine that fits in a 2U chassis and that costs $2,000 in a bare-bones configuration using four-core Xeon E5-2603 processors running at 1.8 GHz. So a 42U rack could have 40 two-socket servers and 120 K10 GPUs with a little room for the switching. That rack would cost on the order of $700,000 and deliver around 560 teraflops of single-precision processing on the both the CPU and GPU together. That works out to $1,250 per teraflops. If you want to just compare the K10 to the Kyoto GPU, then the aggregate K10 processing costs on the order of $1,100 per teraflops.

There is a lot more oomph in the Xeon-Tesla rack, but the Moonshot chassis with Kyoto chips offers cheaper flops and might be suitable you need lots of units doing a little bit of work at a steady pace. That kind of difference in price is one reason why the m700 card is going to see some action outside of hosted desktops, and very likely in places where single-precision floating point math and an X86 instruction set are important. Think financial services, video encoding/decoding, life sciences, and maybe seismic processing for starters.

But as HP has pointed out, the future m800 card, which puts four of the KeyStone-II hybrid chips on a cartridge, is probably going to be more popular for computational work. The KeyStone-II has a four-core Cortex-A15 processor and eight digital signal processors that together deliver around 1 teraflops of single-precision floating point performance and a respectable 384 gigaflops at double precision. So, 1,800 of these KeyStone-II chips are going to pack quite a wallop. It remains to be seen what the m800 will cost, however.

By the way, HP is now also selling the Moonshot machines to customers. With the single-socket m300 cartridge, which uses Intel's eight-core "Avoton" Atom C2000 processor and which has a single 500 GB disk drive on it and 32 GB of memory, a chassis with fifteen m300s and a single switch costs $25,000. A Moonshot chassis with one switch and fifteen m700 cards will sell for $37,000. Intel and HP are working to put four Avotons on a single cartridge, mirroring the density of the other cards HP has been showing off; it is unclear when this will ship.

ConvergedSystem for Cloud Building

The other new setup from HP is called the ConvergedSystem for Virtualization, and it is designed to roll into your datacenter and start serving up virtual server slices. There is a version aimed at midrange shops and another aimed at larger enterprises.

The ConvergedSystem 300 is the one at aimed at midrange customers, and it comes in a standard and performance edition, and the difference is the speed of the switches in the stack. The standard edition uses the HP 2920-48G Ethernet switches, which has ports that run at 1 Gb/sec, and the performance edition uses the HP 5900AF-48G, which bumps the ports up to 10 Gb/sec. The ConvergedSystem 300 has from three to eight of HP's dual-socket ProLiant DL380 Gen8 servers in the rack, as well as from three to eight of the company's StoreVirtual VSA storage arrays. At the moment, VMware's ESXi 5.5 hypervisor is supported on the setup, with other hypervisors coming next year. Red Hat's KVM and Microsoft's Hyper-V are the next obvious choices. The entry price for the ConvergedSystem 300 is $136,600.

Frances Guida, manager for converged systems at HP, says that the ConvergedSystem 300 is designed to be put on the company datacenter floor within 20 days of being ordered, complete with software and after a burn in period for testing components that have been integrated. This is faster than HP has been able to do it in the past. HP's own analysis also shows that the ConvergedSystem 300 has about twice the performance and about 25 percent lower cost than a Vblock 100 setup from the Virtual Computing Environment (VCE) partnership operated by Cisco Systems and EMC.

The larger ConvergedSystem 700 for Virtualization stack is aimed at supporting larger clouds, and it uses HP's BladeSystem blade servers and its 3PAR disk arrays as the basic compute and storage components. This setup has two ProLiant DL360p Gen8 servers as management controllers, plus one BladeSystem c7000 chassis and from four to sixteen ProLiant BL460c Gen8 blade servers with Intel's new "Ivy Bridge-EP" Xeon E5-2600 v2 processors. The blades have 256 GB of memory each. The rack also includes a 3PAR StorServ 7200 service processor and storage base with 36 15K RPM disk drives with 300 GB of capacity each. This is all connected using two of the HP 5920AF-24XG 10 Gb/sec switches, and two HP 5120-24G 1 Gb/sec switches are used to link the management nodes to the server nodes. The ConvergedSystem 700 is expanded in blocks of four blades and 36 disks. The base configuration of the ConvergedSystem 700 costs $570,000. VMware ESXi 5.5 and Microsoft Hyper-V 3.0 are supported on this machine now.

HP did not provide a comparison to the Vblocks for this setup, but Guida said could demonstrate a 28 percent lower total cost of ownership with the ConvergedSystem 700 compared to customers racking up machines by themselves. HP's goal is to get this stack from order to running in the datacenter in under 30 days.

That leaves the final new stack, which is the ConvergedSystem 300 for Vertica. The hardware is similar to the virtualization setup above, but not exactly the same. The base rack comes with three ProLiant DL380p Gen8 servers for database nodes and one ProLiant DL360p Gen8 as a management node. The database nodes have 128 GB of main memory, a little skinnier than the virtualization nodes above, and 25 disk drives. Two 300 GB drives are used for the operating system and 23 600 GB drives are used for the Vertica 7 database. The ProLiant servers are configured with Red Hat Enterprise Server 6.4 and linked together using the HP 5900AF-48XGT 10 Gb/sec Ethernet switch. To expand the Vertica cluster, you buy in these blocks and just extend the cluster.

The entry configuration of the ConvergedSystem 300 for Vertica is expected to sell for $360,000 when it starts shipping in February 2014.

EnterpriseAI