Advanced Computing in the Age of AI | Thursday, March 28, 2024

Cray Goes After Big Iron With ScaleMP Help 

Supercomputer maker Cray is teaming up with ScaleMP to turn its CS300 clusters into virtual shared memory systems that are not only appropriate for some traditional HPC workloads, but can also be peddled to customers who are looking to run databases and other applications on machines with large memory capacity but who do not want to pay for a hardware-based shared memory systems.

Cray knows a thing or two about high performance systems, but it has been a long time since the company has sold a general-purpose shared memory system. The most famous hardware-based symmetric multiprocessing (SMP) server with a Cray badge on it was probably the Superserver 6400, from twenty years ago, which was the daddy of the wildly popular Sun Microsystems E10000. Cray went parallel with its supercomputer designs many years ago, as did the rest of the supercomputing industry and, indeed, as have many of the applications used by hyperscale Web companies and the financial services industry.

But sometimes you need a fat node in a cluster, and the vSMP Foundation software from ScaleMP, which glues together multiple server nodes into a virtual shared memory system, can create such a fat node from multiple skinny X86 systems. vSMP Foundation is a kind of server virtualization hypervisor, only instead of carving up a single machine into multiple virtual machines (like ESXi, KVM, Xen, or Hyper-V do) it takes multiple physical machines and makes them look like one big virtual machine with a single memory space for applications to play in. vSMP Foundation does in software what real SMP systems do in hardware.

Before we get into that, let's talk for a second about SMP and NUMA. SMP processor and memory clustering technology clusters systems at their memory bus (rather than at the network layer in parallel clusters) and makes all processors in the machine share the memory bus and access main memory directly. NUMA systems, on the other hand, have memory attached to each processor, with the local memory in the processor socket being accessed very fast while memory on the adjacent processors in the system is accessed in a non-uniform manner through a point-to-point interconnect – in Intel's case called QuickPath Interconnect. The name Non-Uniform Memory Access comes from the fact that the memory buses are distributed and you need to go through that point-to-point interconnect to access some of the memory in the system. NUMA has a performance penalty compared to SMP, but has the virtue of scaling further and more cheaply, which is why there are very few SMP servers sold today. Most big systems are actually NUMA setups if they have more than one processor and they share memory, although the SMP name sticks around.

Here's the issue that enterprise shops wrestle with: Real SMP servers and their NUMA follow-ons, of course, can scale up the processors, cores, main memory, and memory bandwidth. They tend to do them in lockstep, though, so if you need more memory, you have to buy more processors even if you don't need them. This can be very expensive, obviously, and not just for the hardware but also for software licenses tied to the system. This is one of the reasons why IBM, Silicon Graphics, Hewlett-Packard, Oracle, and Fujitsu are happy to sell you a big, bad box if you need a lot of memory to run your database or applications, and their software partners love it, too.

If the memory needs of your applications are a bit less rigid, says Barry Bolding, vice president of marketing at Cray, then running a CS300 cluster with the vSMP Foundation hypervisor on top might be a better answer than one of these SMP or NUMA boxes.

"We are taking a software approach to large memory applications," explains Bolding. "What that does is keep the cost low and it keeps the flexibility high. It may not necessarily perform up to having a hardware solution, but underneath that, it is very flexible. And in fact, it could outperform a hardware solution as you upgrade the underlying hardware in the CS300 itself."

It is certainly true that customers with high-end NUMA machines tend to have to wait a long time for upgrades – often as long as two or three years – while X86 processors are refreshed roughly every 18 months or so at the current pace. Sometimes there is a mid-lift kicker to a big RISC or Itanium machine that doubles up the memory and offers a modest speed bump. But the days of nearly annual system refreshes for these high-end systems are over. The customer base is too small to justify all the engineering, even if these big systems do generate billions of dollars a year in revenues for their makers.

Cray is selling two different CS300 configurations that use the ScaleMP virtual shared memory software. The CS300 nodes are clustered using either 56 Gb/sec InfiniBand or 10 Gb/sec Ethernet, as is the case with regular CS300 clusters, and this is the transport layer

The first is called the CS300-LMS, and that is short for Large Memory System, and it comes in two flavors. One has a two-socket Xeon server based on the new ten-core Xeon E5-2690 v2 running at 3 GHz and offering 4.375 TB of main memory, expandable to 8.375 TB with a second node glued to its using vSMP Foundation. (The ScaleMP software uses a block of main memory as cache.) This setup costs $182,500 in the base configuration and $295,000 in the full configuration, according to pricing available through ScaleMP. A fatter CS300-LMS configuration is based on four-socket nodes in the CS300, which use Intel's eight-core Xeon E5-4650 processors. The initial node has 32 cores and 4.75 TB of useable memory and costs $212,500. Adding a second four-socket node doubles up the system to 64 cores and 8.75 TB of shared memory, and costs $325,000.

Shai Fultheim, founder and CEO at ScaleMP, tells EnterpriseTech that this CS300-LMS configuration is aimed at applications like databases or chip design software, which need more memory not more cores. For those applications that need both more cores and more memory, Cray and ScaleMP are peddling the CS300-SMP, which also comes in two flavors as well as custom configurations based on the needs of an application. This setup starts with 18 two-socket Xeon E5-2680 v2 processors, which also have ten cores and which have a 2.8 GHz clock speed. That gives you a total of 360 cores using 4.75 TB of shared memory across the nodes. You can boost the capacity of the machine with another 16 two-socket nodes, yielding 680 cores and 8.75 TB of memory. The base configuration of this machine costs $287,500 with a slower network with 112 Gb/sec of system bandwidth across the nodes, and $322,500 if the bandwidth is doubled up to 225 Gb/sec. Adding those extra 16 nodes and boosting the memory and cores pushes the price up to $480,000 for the slower network and $530,000 for the faster network. (The price list is published here.)

Cray can do smaller or larger customized configurations as needed, too. vSMP Foundation can scale up to 128 server nodes in a single system image. The CS300 line scales further than 128 nodes, of course, and Bolding says that some customers may use a "condominium" style approach, putting several virtual big boxes into a single cluster, and then changing them as applications dictate.

In general, the cost of the vSMP software will depend on how it is deployed across the CS300 cluster. Bolding says it will depend on the deal. If customers are just adding some virtual fat nodes, it will be on the order of an additional few tens of a percent on top of the deal. If customers put vSMP Foundation on all of the nodes, it will add appreciably to the cost of the cluster, but be less than twice the price of the raw CS300 machinery and, more importantly, still less expensive than a big NUMA server.

An important thing for Cray, of course, is to expand out beyond its traditional supercomputing market. The CS300 has been sold into financial services and healthcare companies, and Bolding expects to sell the virtual shared memory versions of these systems into those markets as early adopters. This is exactly what Silicon Graphics has been doing with its "UltraViolet" UV2 systems, which have hardware-based NUMAlink 6 routers to glue together blades based on Intel's Xeon E5-4600 processors into a shared memory system.

SGI has support for both Linux and Windows Server on its platform, but thus far ScaleMP's vSMP Foundation has been limited to Linux. Specifically, Red Hat Enterprise Linux 5 and 6 and SUSE Linux Enterprise Server 11 SP1 and SP2. That is not a big limitation in supercomputing or big data, where Linux is the dominant platform, but it could be for companies who want to use a virtual shared memory system to load up databases and run bigger chunks of their data in memory. Windows is a popular platform for IBM's DB2 and Oracle's 11g and now 12c databases as well as for Microsoft's own SQL Server databases.

One last thing: You don't have to buy vSMP Foundation if you want to test it out on a baby cluster to see if it might help boost the performance or scalability of your applications and databases. Two weeks ago, ScaleMP announced vSMP Foundation Free, which as the name suggests is a freebie version of the system-lashing hypervisor. This free version, which you can download here, allows you to combine up to eight server nodes with a maximum of 1 TB of total main memory. The company has had hundreds of downloads as of last week, when we talked to Fultheim, and that is much higher than he expected.

EnterpriseAI