Advanced Computing in the Age of AI | Friday, March 29, 2024

AWS Back in Business, but Enterprises Can Learn from Outage 

Amazon Web Services appears to have recovered from an outage that disrupted customers including Tinder, Netflix, and IMDb on Sunday. And while the service's disruption should not dissuade businesses from cloud adoption, it should encourage them to ensure they are doing enough to keep operations running smoothly.

The company's oldest public cloud datacenter, located in Ashburn, Va., initially reported DynamoDB issues, according to the AWS Service Health Dashboard. Troubles increased when the database reported increased error rates responding to API calls, leading to a domino effect for other services. In response, Amazon (NASDAQ: AMZN) throttled APIs to recover the service.

Troubles were not based solely in the United States. On Monday morning, users in the Europe reported problems with Skype. Users in the United Kingdom, Spain, and Ukraine said they could not access the system, according to downdetector.uk.

"We are working hard to fix an issue which is preventing some users from logging in and using Skype," a Microsoft (NASDAQ: MSFN) spokesperson told CloudHub UK. "We apologize for any inconvenience and will keep our users updated."

skypeThe issue affected consumer users, not business accounts, according to Skype's website. The company has identified the network issue that is preventing users from logging on and is in the process of reconnecting users, Skype said.

This is not the first time a cloud service provider suffered an outage. Nor is it likely the last time cloud customers will find themselves unable to access services. But that does not mean enterprises should reconsider their cloud policies. Cloud services still are more reliable and secure than most on-premise solutions, said Mike Chase, chief technology officer at cloud service provider dinCloud.

"Cloud providers do a more consistent job of deploying infrastructure since the best talent gravitates to their teams worldwide," he told EnterpriseTech. "Plus cloud’s inherent standardization and cost modeling leads to the ultimate bang for the buck value propositions."

That's not to say Amazon did not apparently make some mistakes here, according to Chase.

"There’s too much reliance on centralized cloud orchestration systems instead of dedicating isolated key mechanisms on a per-customer basis to enhance security and stability," he said. "Too many cloud orchestration systems have back-roads into customer environments which are meant to be isolated. While centralizing various functions cuts cost, it’s unconscionable to subject the cloud universe to a single systemic issue, which could wipe all customers out or lead to massive data loss. Unfortunately, as long as people are merely content to attend the big cloud shows and party down while forgetting to demand answers to the hard questions or jump ship if they can’t get them, then nothing will ever change."

It's also incumbent on cloud customers to ensure their infrastructures are up to par, cautioned Jim Reavis, CEO of the Cloud Security Alliance.

"Cloud computing outages are a reminder that no computer system will ever be perfect and provide 100-percent uptime. Customers must take responsibility for designing redundancy and fault tolerance into their mission critical cloud applications, just as they do for internal IT systems," he told EnterpriseTech. "There are a number of methods to obtain maximum uptime, including using multiple availability zones or even multiple cloud providers. Unfortunately, too many cloud customers simply assume that purchasing a base offering of virtual machines provides all of the security and availability guarantees that are necessary, instead of understanding the shared responsibility that exists between customers and providers. Ultimately, the most important lesson is for developers to assume networks and infrastructure are unreliable in their design phase and to build resiliency into the software application."

Enterprises should be diligent, agreed Doron Pinhas, CTO of Continuity Software.

"Unfortunately outages continue to occur, even at the best run shops. Consumers of cloud services that require maximum availability should consider taking proactive measures," he told EnterpriseTech. "Enterprises continue to migrate key business services to the cloud. The economies of scale are simply too compelling to ignore. Of course, as this outage reminds us, caution is warranted."

Likewise, businesses should determine where their data resides – something that quickly becomes a concern when an outage occurs, said Tony Hampel, senior director of product marketing at Connected Data.

"If the data is stored in a single location and there’s a problem such as a power outage, this data is becomes unavailable until the problem passes. Worse still, the data may be lost forever in the event of a natural disaster. Such situations compromise business activities, cause a loss of productivity and revenue, and may be unrecoverable in worse case," he told EnterpriseTech via email. "A replicated private cloud file sync and share appliance that’s company owned and within the company's firewall is a much better alternative. A minimum of two replicated appliances are deployed in different locations and should disaster hit one location, the organization’s users will simply failover to the second device and continue working without interruption."

These recent outages by Amazon and Microsoft could be an opportunity for smaller cloud service providers to demonstrate their differentiating technologies and services to dissatisfied or nervous customers. But it's up to enterprise professionals to request a datacenter tour and get answers about prospective providers' security solutions, such as penetration testing, said Chase.

"We’re candid in that all clouds have their bad days, but we’ve never had the kinds of outages our competitors have had. Since we wrote our cloud orchestration platform from scratch instead of using OpenStack as the starting point, we feel we have a lot of advantages that the majority of competitors can’t offer (who are based on that solution)," he said.

About the author: Alison Diana

Managing editor of Enterprise Technology. I've been covering tech and business for many years, for publications such as InformationWeek, Baseline Magazine, and Florida Today. A native Brit and longtime Yankees fan, I live with my husband, daughter, and two cats on the Space Coast in Florida.

EnterpriseAI