Advanced Computing in the Age of AI | Thursday, March 28, 2024

Nokia Dev Cloud Will Swell to 100K Cores 

Be careful about giving easy access to computing. You may end up with far more demand than you planned.

This is precisely the position that the IT department supporting the application development infrastructure at Nokia Solutions and Networks is in after rolling out a cloud to support the creating of software and the simulation of network elements and components that go into the devices that the company makes. What started out as a modest 12,000-core cloud two years ago is now estimated to hit 100,000 cores or more next year.

Janne Heino, manager of software development cloud solutions at NSN, tells EnterpriseTech that demand on the cloud that is used explicitly for software development and testing and simulating network gear has gone wild.

Formerly known as Nokia Siemens Networks, the new NSN is owned completely by Nokia after Siemens sold its stake to its Finnish partner back in August for €1.7 billion (around $2.3 billion). Siemens, the industrial manufacturing giant, partnered with Nokia in 2007 to unit their efforts to make telecommunications gear that adhered to the GSM cellular standard that was used in Europe; the partnership has moved on to make equipment for building LTE networks and is one of the dominant suppliers of mobile broadband gear in the world.

For the past six years, the 30 development sites operated by the Nokia and Siemens partnership installed their own servers for application development and testing. Before the cloud project was started two and a half years ago, those sites had around 1,200 servers with on the order of 8,000 cores in the aggregate. This is not an obscene amount of iron, but it is large enough to be a big cost center. NSN decided that spreading these servers around so thinly was wasteful, and figured that it made more sense to build a shared resource that could be run more efficiently.

So three years ago Heino and his IT staff stole a few servers here and there from various datacenters (this is actually common practice in large organizations for proofs of concept trials) to put together a prototype cloud. The company did a bake off between OpenStack and Eucalyptus, two popular open source cloud controllers available at the time, and came to the conclusion that Eucalyptus was more mature.

The NSN development cloud is run in four different facilities – one each in the United States, Finland, China, and India – with a hot backup site in the event that one of them fails. The sites are networked and applications can be pushed to any part of the distributed cloud.

As 2013 was starting, the NSN cloud had 12,000 cores in it, and this year after developers and equipment testers learned how to use it, the core count has been increasing by 5 to 7 percent month to month. Heino says that he is anticipating that the cloud will have something on the order of 20,000 to 25,000 cores as 2013 comes to an end. NSN is estimating for the cloud to have 100,000 cores in 2014. That is a factor of 100X growth in the number of cores since the cloud was installed, and the obvious question is: Where does it stop?

"Don't ask me," says Heino with a laugh. "The growth is a little scary. It is never going to end because R&D will always invent something new. It is growing quite large, with lots of environments. As for the cloud growing at this rate in 2015, I don't know. If I knew that, I think I would have to buy stock in hardware companies. It is difficult to say because the use cases for the cloud are also changing."

Just to give you a sense of the scale of this cloud. The largest X86-only cluster on the Top500 list of supercomputers that does not have Xeon Phi or Nvidia Tesla accelerator is the SuperMUC machine built by IBM for Leibniz Rechenzentrum in Germany. It has 147,456 cores and has 3.2 petaflops of aggregate performance.

NSN's cloud started out being used for traditional application building and testing. Simulation of network elements and traffic has been added so NSN can model how its telecom gear and software will perform in the field.

One of the reasons that the R&D cloud is so popular is that it has a self-service portal, which makes it easy for developers to use Eucalyptus to fire up Red Hat Enterprise Linux or CentOS instances on top of a KVM hypervisor. (Nokia has a few of its own software environments that can be allocated on the cloud, but 95 percent of the workloads are running on these two variants of Linux.) But the other reason that the R&D department is crazy for the cloud is that it costs half as much to deploy equivalent capacity as it did back on physical boxes – and the cloud segments scale further than any of the clusters at any one of the 30 facilities did back in the old days.

While the scale of the cloud is important in terms of the number of server nodes and core counts, an equally important metric for NSN is the rate of change on the cloud. And this metric, says Heino, is more important to NSN and is one of the forces driving the growth of the cloud. At any given time, the NSN development cloud is running several thousand different application instance types, and over the course of a month, approximately 50,000 different instances will roll onto and off the cloud.

Researchers are also running larger jobs because now, if they need a few thousand cores for a week, it can be allocated to them. This was not possible in the past when the machines were all spread out around the NSN facilities.

Being a high-tech company itself and very cognizant of power and cooling because the same issues that affect datacenters affect telecom equipment closets, you might be thinking that NSN would buy the latest-greatest hyperscale, minimalist servers available from Hewlett-Packard, Dell, IBM, or Fujitsu. NSN started its cloud on blade servers from an unnamed vendor, but quickly decided that it was better to use plain vanilla 1U rack servers based on X86 chips with all of the extras (like redundant power supplies) all ripped out.

"We have a lot of workloads that are CPU and I/O intensive," says Heino. "We want a very simple machine that we can get from many different vendors. Specialized hardware environments might be useful for certain kinds of applications, but our researchers want the option of broad CPU power." And you can't always get the fastest CPUs in blade servers, although you can usually get them in the minimalist designs. But NSN is looking to avoid lock in as well. So a 1U rack server is the lowest common denominator, at least for its cloud.

Nokia is investing in its R&D cloud because the telecom equipment business is vital to the company. The company was an innovator in cellphone design, but is in the process of selling its handset business to Microsoft for €3.79 billion ($5.1 billion) and licensing its patents to the company for an additional €1.65 billion ($2.2 billion). Nokia is getting out of the handset business because it is losing money after a steep revenue decline.

In 2012, its NSN unit only had a 2 percent revenue slip to €13.8 billion, and that unit's operating profits more than tripled to €778 million. After the sale to Microsoft, Nokia will have two units: NSN and another mapping software division called HERE, which represented about 4 percent of Nokia's sales in 2012. (That is including the handset business.) Nokia will be flush with cash and running a profitable mobile broadband equipment business, which will surely cushion the blow when Heino asks for more servers for the cloud.

EnterpriseAI