Advanced Computing in the Age of AI | Tuesday, April 23, 2024

Microsoft’s Azure Cloud Is About More Than Being Hyper 

The funny thing about the largest cloud computing providers is that they are all bragging about the scale at which they can provide compute, storage, and networking capacity. But none of them want to reveal precisely how much capacity they have installed in their datacenters and regions. We all presume that their capacity is immense and can take on any of the big jobs customers can throw at them on a whim.

Microsoft held a briefing session at its Redmond headquarters last week that gave EnterpriseTech a preview of the slew of updates to its Azure cloud that will be announced this week at TechEd in Houston. Corey Sanders, principal group manager for the Microsoft Azure cloud, did not talk specifically about how large the Azure cloud was or how much excess capacity it had. But he gave some stats on what Microsoft is pushing through the Azure cloud and provided some clues about the immensity of the capacity on hand.

(One interesting aside: Because Microsoft has an open mind about Linux these days, at least when it comes to the cloud, the company has recently dropped “Windows” from its name and now simply refers to it as Microsoft Azure. But at its heart, of course, Azure is a Windows stack that can run Linux instances, much as OpenStack is a Linux stack that can also run Windows instances. We point this out because OpenStack Summit is going on at the same time this week in Atlanta.)

At the moment, the Azure cloud is operating in 16 different regions worldwide. Microsoft doesn’t talk about the architecture of its datacenters or the machinery inside of them very much ­– excepting the homegrown server design that the company donated to the Open Compute Project back in January, an admission that Microsoft had over 1 million servers last year in its cloud (using that term loosely, we are sure), and some talk of Dell containerized datacenters in its Chicago facility a few years back,

microsoft-azure-platform

Sanders provided a few more insights about the scope and scale of Azure. First, back in February, Azure was the cloud that NBC chose to do a significant amount of the computing and hosting work necessary to display the Winter Olympics in Sochi to the world. Specifically, Azure did the computing and storage for live video encoding and streaming for both Web and mobile devices, boasting over 100 million viewers during the week of the games and hitting a peak of 2.1 million concurrent HD users during the United States versus Canada hockey match. Two weeks after that, the Xbox One and PC game Titanfall, which is only available as a multiplayer game and only from the cloud was launched, and on day one Microsoft fired up over 100,000 Azure virtual machines, across all of its regions, with over 300,000 cores dedicated to those VMs. Because the game was not written to span a wide variety of PC and game console generations but was rendered and played entirely in the cloud, Respawn Entertainment could throw massive amounts of compute at the game to do more physics simulation, higher resolution rendering, and more artificial intelligence for simulated soldiers in according to Sanders.

Now here is the interesting bit, which Sanders confided to EnterpriseTech after his presentation: “Titanfall is not something we could have done two years ago.”

Microsoft has enough spare capacity to host the Olympics media or an online game and not affect the myriad services such as Office 365, Skype, and Azure Active Directory, which is used to authenticate users to give them access to these and as it turns out thousands of other services. Former Microsoft CEO Steve Ballmer confirmed last July that the company had over 1 million servers in the Azure cloud and the talk on the street is that it is approaching 2 million machines, but Microsoft will not confirm these figures. Nor will it provide a breakdown of what part of the capacity is actually used for infrastructure and platform cloud services that are sold utility-style to customers. We presume it is still a minority of the capacity.

What we can tell you is that Azure hosts over 300 million Active Directory users today and processes over 18 billion authentications per week. (By the way, there are 750 million Active Directory users hitting internal authentication servers worldwide, so the ramp of Azure Active Directory has been nothing short of stunning.) Azure Storage, the object storage component of the storage backend for Microsoft’s cloud, has over 25 trillion objects (up by a factor of three in 11 months) and processes 2.5 million transactions per second for file requests among its many users. (A year ago, Amazon said that it was hosting over 2 trillion objects in its S3 service and processing 1.1 million requests per second. It was also on an exponential curve, but AWS is interesting in that it has autocleaning features to keep the file count low on purpose.) There are more than 1 million SQL Database instances running in the platform side of the Azure cloud, too.

While the server count is interesting, what we wanted to know was how Microsoft does capacity planning on the cloud and how much headroom it has in its Azure facilities to deal with what can, in theory, be unpredictable demand for users of its infrastructure, platform, and software cloud services.

microsoft-azure-regions

As we all well know, when you run your own datacenter, you have to plan for peak workloads at the end of week, month, or year, and these peaks can be very high and leave a lot of unused capacity around. This is why virtualization has been so important, because it allows companies to interleave applications across lines of business and keep that peak capacity lower across all workloads than it might be if applications were siloed. Virtualization adds complexity and it eats some capacity of its own as well as costing some pretty big bucks, but it is a net gain in terms of flexibility and manageability as well as lowering server counts and therefore server budgets.

If virtualization allows companies to interleave workloads at their own companies, clouds allow them to do it across companies, further boosting efficiencies. This is great for users. But cloud providers like Microsoft still have the capacity planning issue. It doesn’t go away. But it is, as Sanders explained and as we surmised, mitigated by the fact that Microsoft has so much capacity spread around so many datacenters that it can shift workloads across regions around the world (with customers paying a little bit of a latency penalty in some cases, of course). Customers are responsible for coming up with their own demand models, said Sanders, and then Microsoft weighs them in the aggregate to make sure it can meet the overall demand.

“This is why we think that hyperscale is such a critical part of the equation,” Sanders explained. “There are not a lot of organizations in the world like Titanfall. All customers in the world running at the same time are probably not going to equal one Titanfall. The key point is that once you have that hyperscale, you have the regions and the compute slack within each region, you end up being able to support pretty much any customer demand that comes.”

When asked precisely how much excess capacity Microsoft has in Azure, Sanders just smiled and said: “Enough.”

When pushed he elaborated a bit further. “Because we have so many different customers doing so many different things, they end up actually weighing each other out. The hours of Titanfall are very different from a healthcare provider. It enables us to worry a lot less about slack because the slack comes from each other. All the folks working during the day stop and then the Titanfall customers come online and take that capacity that they were just using at their job. And this is why the economics of the cloud make sense, specifically.”

Enterprise-Grade And Hybrid – Sometimes Both

In addition to hyperscale, Microsoft says that the Azure cloud has two other attributes that are key, and that includes hybrid capabilities and enterprise-grade features. “We believe that Azure is the only one that boasts all three,” said Sanders.

It is a fair point. Amazon Web Services certainly has hyperscale, on par with the server farms at Microsoft and Google, and has a wide variety of infrastructure and platform services. But Microsoft owns the Windows stack and has learned to accept Linux; the company also offers the same kind of support coverage (for a price) as it does for its Windows stack – and compatibility as well across internal Windows systems and external Azure cloud capacity. To be sure, Azure has more goodies than Windows right now, and it is very likely that it will stay that way. Microsoft is using Azure as a testbed for ideas that require more automation to run at scale. Google has plenty of global scale for its infrastructure and platform services, and also has lots of software that it sells as a service. Some of this Google software is compatible with Windows software, and some of it is not. IBM and Oracle are trying to build an enterprise-grade cloud that runs its own systems, middleware, and application software. Big Blue bought SoftLayer last year for an estimated $2.2 billion and invested another $1.2 billion to build it out this year to try to get to hyperscale. But even after that build out, which will also include putting SoftLayer capacity into 40 datacenters worldwide, IBM will only have several hundred thousand servers and on the order of millions of virtual machines, according to Lance Crosby, CEO at the SoftLayer subsidiary who gave EnterpriseTech the inside dope on the plan. Rackspace Hosting ended 2013 with 103,886 machines across its datacenters, which run a variety of hosted and cloud workloads. It may be one of the main drivers behind the OpenStack cloud controller, but its scale is falling behind.

Microsoft means a lot of different things when it talks about enterprise capabilities. It means supporting Oracle’s database, Linux, WebLogic middleware, and Java runtimes on Azure. (Other Linuxes from SUSE Linux and Canonical are also supported, but the Oracle support is more recent and fresher in memory.) Enterprise-grade means treating the open source Chef and Puppet configuration management tools as peers to its own PowerShell, and with the Azure update this week Microsoft is allowing for PowerShell scripting on virtual machines to be launched directly from the console, without even having to log into the VM. You just launch an image and point a script at it.

Furthering its enterprise breadth means working with Trend Micro and Symantec to come up with a new agent scheme for VMs on Azure that will allow for their antivirus and antimalware software to be injected directly into the VMs, a capability that is announced today at TechEd and that comes ahead of Microsoft’s ability to put its own antimalware software into VMs in the same fashion. (Microsoft Antimalware for Azure is only in public preview as of this week, but Trend Micro’s Deep Security antivirus and SecureCloud encryption software is available on Azure using the new agents now. Symantec support is coming.) Microsoft has a similar partnership with Barracuda Networks to allow it to embed its firewall inside of Azure just as customers would do on premises.

There are many other examples of such enterprise features coming out with the update this week at TechEd, and we will go through a few more of the important ones. Many of them have a hybrid cloud element to them, making it possible to more seamlessly integrate internal Windows systems with Azure capacity.

Companies using Active Directory on premises want to be able to harmonize it with Active Directory running on Azure. With the update this week to the Microsoft cloud can replicate data between the two types of Active Directory, synchronizing who has access to data and, importantly, who has just had their access revoked. By the way, Alex Simons, who is Active Directory program manager for Azure, says that the number one application driving the Azure version is Office 365, followed by Salesforce.com. Azure AD runs out of 27 Microsoft datacenters worldwide and since it has been available for the past twelve months has delivered a 99.997 percent uptime, according to Simons.

The Azure AD APIs are open source, so others can weave authentication from Azure into their applications. Brad Anderson, vice president of program management in the Windows Server and System Center Group, said that 1,270 SaaS applications have done such integration to date and that the number would reach 2,000 apps by July 1. He added another stunning figure: More than 95 percent of the organizations in the world (he did not define that specifically) use Active Directory as an authentication authority. Interestingly, Simons said that inside of China, authentication tools like AD and LDAP are not used on cloudy infrastructure, and they don’t do authentication in this manner.

Simons also pointed out one other thing that makes the cloud version of Active Directory better than – and for the moment different from – the internal one. Microsoft does a lot of big data crunching and machine learning algorithms on the Azure variant of AD, mashing up information from its cybercrimes unit (which has a list of IP addresses that have known malware on them) to look for anomalous login behavior. The cloud version of AD also keeps track of what each user typically does and looks for abnormal patterns, and then asks for two-factor authentication when something looks funky.

Companies could decide, of course, to run all of their Active Directory Authentication from the cloud. Azure AD will come in two flavors, a Basic Edition that is aimed and small and medium businesses and a Premium Edition that has all of the bells and whistles that are necessary for the top 5,000 or so companies in the world. Azure AD can be bought standalone or as part of Microsoft’s new Enterprise Mobility Suite, which puts together Azure AD, the Windows InTune device manager, and Azure Remote Monitoring Service into a single bundle. For the next several months, pricing is best when buying the suite, which costs $4 per user per month.

Here’s another useful hybrid feature. Back in January, Microsoft announced the general availability of Hyper-V Recovery Manager, which integrates with the Virtual Machine Manager plug in for System Center to allow for VMs to be backed up to and recovered at a secondary datacenter. The Hyper-V Replica feature does, as its name suggests, the replication and Azure is used as a place to store the recovery plan and as the means to orchestrate that recovery. This service has now been renamed Azure Site Recovery and now customers can back up VMs from their primary datacenter to the Azure cloud and recover them there – no secondary site required. You cannot replicate a VM to both a secondary datacenter and Azure at the same time. It costs $16 per protected machine per month to use Azure Site Recovery, and that includes the storage of the virtual machines on the Azure cloud. You do have to pay for Azure compute instances on top of this when you fire them up as part of a recovery operation.

While all of these enterprise-grade, hybrid features will no doubt appeal to customers, it is the services from the largest cloud providers that are giving them leverage over plain vanilla cloud providers – and there are so very many of the latter. The real trick to not losing your shirt in the cloud in these early days is to have an existing business that is made better by the advent of cloud services. Amazon’s genius with its Web Services subsidiary is that it created a business model where companies basically pay it to do research and development as well as build systems and datacenters, and it basically gets its IT for free. Google has a search engine and advertising business that supports its cloud aspirations, and even IBM has deep enough pockets and customer relationships to become a cloud player for the Global 2000 without too much trouble and possibly extending it further down into the midrange if it makes clouds that support its AIX, IBM i, and Linux customers.

EnterpriseAI