AccuWeather’s Compute Strategy Takes Forecasting to the Cloud at Scale
Weather and the cloud go together. Take AccuWeather, the State College, PA-based consumer and commercial weather services provider. Demand for weather forecast information from media outlets, from business customers and, most of all, from the public fluctuates widely (though on average it’s increasing by tens of thousands of data requests per day). If a storm is bearing down on a major population center like New York, hits on the AccuWeather site spike. But when there isn’t much weather to worry about, demand drops.
Until five years ago, when the company received “only” about 100 million daily data requests, AccuWeather maintained an on-premises data center to handle its web support and data analytics requirements. But with the growing tsunami of weather data now available that must be ingested and analyzed (with GPU-based systems) in short timeframes, and the booming demand for information from consumers (daily average: 15 billion data requests) as AccuWeather expands globally, the economics of hyperscale cloud computing became too compelling to remain with an on-prem IT model.
“The great thing about the cloud,” Accuweather CTO Chris Patti told EnterpriseTech, “is that almost every provider has some kind of auto-scaling capability. With the cloud, we scale for the median or the average and we can peak effectively. But with on-prem, you’re actually scaling for peak.”
Patti said matching the compute capabilities of its public cloud services would cost hundreds of millions of dollars, and would also mean that much of its compute capacity would often go unused.
“There’s no way that we could scale on premises to our current requirements,” he said.
Basing resource planning on average demand, AccuWeather tracks key performance metrics
“What we look for is trends based on the amount of memory, CPU, data requests – what’s the baseline for the number of servers we actually need,” said Patti. “Within a matter of about five to 15 minutes we can scale up to triple what we had that to handle the load. We’re constantly running background checks examining performance, we monitor our systems and we have scaling scripts. So if we hit a metric – the key metric is response time back to the user – if that number fluctuates beyond an acceptable amount of time we’ll scale up 20-30 percent.”
AccuWeather has implemented a dual-cloud strategy that Patti said leverages the combined cloud capabilities of Amazon Web Services and Microsoft Azure. Given the criticality of weather forecasting for the safety of people and property, Patti said, he likes having the cloud services back up each other in case one of them has an outage. Two years ago, when he implemented the dual-cloud strategy, the two providers had somewhat different strengths: AWS’s Linux-based technical computing orientation, including Nvidia GPU servers for graphical image processing of weather data; Azure’s Windows-based consumer orientation, for support of high volume-access to the AccuWeather web site.
Patti noted that over the past two years, Azure has added Linux and GPU capabilities, but he still likes the redundancy of using the two rival services. So the model remains in place: AWS for HPC-class weather data analytics, Azure for customer data request response.
“We’re a firm believer in best-of-breed. We’re not married to one or the other,” Patti said, adding that “we’re really a Microsoft shop, they’re a natural fit for us with our skill set. The only difference at the time was that AWS had GPU computing, so we moved off the CPU (offered by Azure) to the GPU to do more distributed graphical processing, that’s why we picked AWS.... It seemed like a natural division.”
He noted that while CPUs have advanced in recent years, “GPUs have advanced even more. So the number of cores available on the GPU far outweighed the number we had on the CPU, so we’re able to get tremendous scales of efficiency by going to GPU.”
AccuWeather’s cloud workload strategy is to divide, rather than double, the entirety of its workloads between the two services.
“Our goal is having having them running back to back, so we’d have things running on both cloud providers with an arbitrator in the middle sending traffic to one place or the other,” he said. “So if there’s an outage on one the other service picks up for the other. That gets back to auto-scaling. We’re not going to run full capacity on both, we’re going to run 50 to 60 percent (of the workloads) on each and then if there’s an outage we scale up immediate on the other up to 100 percent.”
In the end, he said, weather is an excellent cloud use case.
“We don’t always need big compute all the time, we need it several times per day over highly focused areas at massive scale,” he said. “There are days when the weather’s really doing nothing, its pretty quiet, and then things can change quickly. So it’s a perfect cloud story.”
Patti said demand for AccuWeather data from the public generally maps to where people live, so in the United States, requests for forecasts coming up for the major metropolitan areas is the source of the heaviest data requests. As for the most difficult areas to make accurate weather predictions, he said the tropic regions - such as South Florida - pose the biggest challenge. Nearly every day includes an afternoon shower, but predicting precisely when those showers will happen is extremely difficult.
But he said weather forecasting is generally improving because forecasts are based on more - and varied - data sources analyzed by increasingly sophisticated HPC-class numerical models.
“We can run highly detailed numerical models really over anywhere worldwide and purpose those resources in 10 minutes,” he said.
He said AccuWeather’s proprietary forecasting system has been under development for 20 years that translates the numerical models into more than 100 languages used around the world. He said the underlying numerical models are digital packages written, to the amusement of some, in Fortran.
“Fortran isn’t typical for HPC,” Patti said, “people laugh about Fortran, but it’s the fastest mathematical computational language that we have and it’s perfect for a calculus-intensive space.”
Patti said running the model itself “is pretty easy, but the secret sauce is the data you feed into that model. You have to initialize the actual forecast model, the things in current conditions with your satellite information – the current state of the atmosphere – to produce an accurate result.
“We collect so much info - whether from users’ weather stations at their houses, to geo-sensing information - that we use to visualize our models and come up with more accurate forecasts. So the model is dynamic, it’s all about thermodynamics and physics, but basically our real skill set is bringing all that vast amount of information and then setting up that forecast model.”