Advanced Computing in the Age of AI | Thursday, March 28, 2024

Cascade of Service Outages Plague Azure Cloud 

There has been no silver lining behind the Microsoft Azure cloud service during August, according to an outfit that tracks cloud outages and downtime.

Meanwhile, Amazon Web Service performance improved during the second quarter of this year.

CloudEndure said this week that Microsoft Azure was hit with what it called an "unusually high level of downtime" earlier this month and over the previous weekend affecting customers in Japan. The cloud tracker said Microsoft reported "partial performance degradation" beginning on August 8 that affected customers in Japan until the following Monday (August 11).

Microsoft disclosed another "full service interruption" on eastern Japan on August 15 but restored service the same day.

CloudEndure reported that affected services included auto-scaling and metrics reporting.

The Japan outage was followed by a cascade of partial service interruptions that first struck customers in the western U.S. then moved on to affect cloud providers in Brazil and, finally, the eastern U.S. throughout the following week.

CloudEndure said the "SLA crushing outages" persisted through August 18, ten days after the outages began in Japan. Equally troubling, reported downtimes of up to five hours overstepped Microsoft's service level agreements by a wide margin.

The cloud tracking service reckons the August outages were the worst recorded for Microsoft Azure in 2014. CloudEndure also reported a whooping nine-fold increase in Azure service interruptions during the second quarter, from three in the first quarter to 28.

azure-downtime-1 azure-downtime-2

Service degradation and interruptions during the second quarter of 2014 "carry more severe effects on application availability," it added. Among the most affected Azure products were its SQL database and computing services used for service management.

According to Microsoft's Azure status history, a "full service interruption" was again reported in Japan on August 15 but the "incident is now mitigated."

As of August 21, Microsoft reported that Azure status was "all good."

The multi-region Azure cloud outages occurred as rival Amazon Web Services' performance quality reportedly doubled in the second quarter. Citing the AWS Health Dashboard, CloudEndure said its global performance issues declined 50 percent in the second quarter of 2014 to just four service errors.

azure-downtime-1

The majority of the performance issues were in the Northern Virginia region where AWS is expanding its cloud services to the federal government and government contractors. Northern California customers experienced eight service errors during the same period.

CloudEndure said it traced the majority of AWS errors during the second quarter to its Route 53 and Elastic Compute Cloud (EC2) services. The cloud tracker also detected "considerable regional fluctuations" in AWS service errors and performance issues. For example, the number of errors in northern California jumped in the second quarter and declined in Sao Paulo, Brazil. The reverse was true in the first quarter of 2014.

AWS also benefits from redundancy in its cloud infrastructure, running applications in multiple "availability zones." If one zone fails, the application can continue running in another zone without service interruption.

The Azure outages occur as Microsoft struggles to define itself in what CEO Satya Nadella frequently refers to as a "mobile-first, cloud-first world." Nadella told company partners in July, “We now need to redefine what it means to build an ecosystem in a mobile-first, cloud-first world."

Based on Azure's performance in August, it appears that the Microsoft, the cloud services provider, has put the cart before the horse.

About the author: George Leopold

George Leopold has written about science and technology for more than 30 years, focusing on electronics and aerospace technology. He previously served as executive editor of Electronic Engineering Times. Leopold is the author of "Calculated Risk: The Supersonic Life and Times of Gus Grissom" (Purdue University Press, 2016).

EnterpriseAI