Advanced Computing in the Age of AI | Monday, April 22, 2024

AWS Making Its Gaudi-Powered, ML-Optimized EC2 DL1 Instances Generally Available 

As machine learning becomes a dominating use case for local and cloud computing, companies are racing to provide services that optimized and accelerated for AI applications. Now, Amazon Web Services (AWS) is introducing a new competitor to the landscape: cloud instances powered by (Intel-owned) Habana’s Gaudi AI processors, marking the first AI training instances provided by AWS that are not GPU-based.

The DL1 instances are now generally available as on-demand instances, reserved instances, spot instances and in other configurations – but so far they are being rolled out only in the US East (Northern Virginia) and US West (Oregon) regions on AWS. Previously, the DL1 instances were only available in preview and beta use.

Intel acquired Habana – which was founded five years ago – in late 2019 for around $2 billion. Habana produces two AI accelerators: the “Goya” inference processor, which debuted in 2018 when Habana emerged from stealth, and the “Gaudi” training processor, which was announced in 2019, shortly before the acquisition by Intel.

A photo shows Habana Labs' HL-205 Gaudi Mezzanine Card. Gaudi-based EC2 instances deliver cost efficiency and high performance, while natively supporting common frameworks such as TensorFlow and PyTorch. (Credit: Habana Labs)

It is this latter processor, Gaudi, that is at the heart of the new AWS EC2 DL1 instances. Each instance includes eight Gaudi accelerator cards, which each come equipped with a single Gaudi HL-2000 training processor with eight fully programmable tensor processing cores. Each card also contains 32GB of HBM2 memory. Beyond the accelerators, the EC2 DL1 instances include 768GB of system memory, custom second-generation (Cascade Lake) Intel Xeon Scalable CPUs and 4TB of local NVMe storage. The instances are capable of 400Gbps of networking throughput.

In a company blog post, AWS Chief Evangelist Jeff Barr detailed some of the attributes of Gaudi, including its Tensor Processing Cores (TPCs) . These are “specialized VLIW SIMD (Very Long Instruction Word / Single Instruction Multiple Data) processing units designed for ML training,” wrote Barr. “The TPCs are C-programmable, although most users will use higher-level tools and frameworks.”

The chip supports several data types, including floating point (BF16 and FP32), signed integer (INT8, INT16, and INT32), and unsigned integer (UINT8, UINT16, and UINT32) data.

It also has specialized hardware – Generalized Matrix Multiplier Engine (GEMM) – to accelerate matrix multiplication.

AWS estimates that these instances will provide “up to 40 percent” better price performance as compared to the latest GPU-powered Amazon EC2 instances when training a typical machine learning model. In comparison, the latest Amazon EC2 GPU instance, the P4, includes up to eight Nvidia A100 GPUs and more system memory – 1,152GB per instance. Part of this price performance is likely the apparently dramatic difference in costs per hour: AWS estimates an EC2 P4 instance at $32.77 per hour on-demand and an EC2 DL1 instance at just $13.11 per hour on-demand.

“The use of machine learning has skyrocketed," said David Brown, vice president for Amazon EC2 at AWS. "One of the challenges with training machine learning models, however, is that it is computationally intensive and can get expensive as customers refine and retrain their models. AWS already has the broadest choice of powerful compute for any machine learning project or application. The addition of DL1 instances featuring Gaudi accelerators provides the most cost-effective alternative to GPU-based instances in the cloud to date. Their optimal combination of price and performance makes it possible for customers to reduce the cost to train, train more models and innovate faster.”

Of course, working with these ostensibly more cost-efficient instances for AI training will require some adjustment from developers. AWS provides customers with access to the Habana SynapseAI SDK, which it says is integrated with frameworks like TensorFlow and PyTorch in a manner that will require minimal code changes for machine learning model migration. There are also Gaudi-optimized reference models available.

For the release, AWS highlighted a series of high-profile customers, including Seagate, Leidos, Fractal – and, of course, Habana’s owner, Intel, which is planning to use the EC2 DL1 instances to train its 3D athlete tracking technology.

“Training our models on Amazon EC2 DL1 instances, powered by Gaudi accelerators from Habana Labs, will enable us to accurately and reliably process thousands of videos and generate associated performance data, while lowering training cost,” said Rick Echevarria, vice president for Intel’s sales and marketing group. “With DL1 instances, we can now train at the speed and cost required to productively serve athletes, teams, and broadcasters of all levels across a variety of sports.”

This article first appeared on sister website HPCwire.