Advanced Computing in the Age of AI | Monday, April 22, 2024

AWS Rolls ‘Inferentia’ Chip Amid Raft of New ML Services, Technologies 

AWS CEO Andy Jassy at re:Invent

Amazon Web Services poured forth a rolling tide of new product announcements today at its re:Invent conference in Las Vegas, an impressive portfolio of technologies that extend the cloud services giant’s reach and appeal across a range of advanced computing sectors.

In a marathon presentation, AWS CEO Andy Jassy spoke almost without letup for three hours, with much of his discussion focused on AWS’s roll-out of 13 new machine learning services and capabilities, including a custom chip designed to speed up ML training and inference, an AI service that extracts text from documents, that can read medical information and that provides customized recommendations and forecasts using the same technology used by’s retail site.

The announcements follow more than 200 ML capabilities launched by AWS over the past 12 months.

Two product launches apply directly to inferencing – making predictions using a trained ML model – which Jassy said can drive up to 90 percent of the compute costs of ML applications.

The new custom chip, called Inferentia and available in 2019, is designed for larger workloads that consume entire GPUs or require lower latency. Jassy said the chip provides hundreds of teraflops per chip and thousands of teraflops per Amazon EC2 instance for multiple frameworks, including TensorFlow, Apache MXNet, and PyTorch, and multiple data types, including INT-8 and mixed precision FP-16 and bfloat16.

Developed at Israel-based Annapurna Labs, acquired by Amazon three years ago, each Inferentia chip provides hundreds of TOPS (tera operations per second) of inference throughput. “You’ll be able to have in each of those chips hundreds of TOPS,” Jassy said, “you can band them together to get thousands of TOPS.”

Inferentia complements AWS’s announcement of Amazon Elastic Inference, available now, also built to cut inference costs compared with a dedicated GPU instance. Jassy said that instead of running on a whole Amazon EC2 P2 or P3 instance with relatively low utilization, developers can run on a smaller, general-purpose Amazon EC2 instance and use Elastic Inference to provision the right amount of GPU performance, resulting in up to 75 cost savings.

“Starting at 1 TFLOP, developers can elastically increase or decrease the amount of inference performance, and only pay for what they use,” he said. “We think the cost equation on top of the 75 percent savings you can get with Elastic Inference, if you layer Inferentia on top of it, it’s another 10 percent improvement in cost.  This is a big game changer, these two launches across inference.”

Elastic Inference supports widely used frameworks and is integrated with Amazon SageMaker and the Amazon EC2 Deep Learning Amazon Machine Image (AMI).

“Running efficient inference is one of the biggest challenges in machine learning today,” said Peter Jones, head of AI Engineering for Autodesk Research, a 3D design, engineering and entertainment software company. “Amazon Elastic Inference is the first capability of its kind we’ve found to help us eliminate excess costs that we incur today from idle GPU capacity.”

Other ML announcements from AWS today include:

AWS-Optimized TensorFlow framework scales TensorFlow across many GPUs, distributing training tasks and achieving close to linear scalability when training multiple types of neural networks (90 percent efficiency across 256 GPUs, compared to the prior norm of 65 percent), according to AWS. Using the product and P3dn instances, AWS said developers can train the ResNet-50 model in 14 minutes, 50 percent faster than the previous best time.

Amazon SageMaker Ground Truth is designed to address the time consuming process of annotators manually reviewing thousands of datasets and labeling required to train machine learning models. AWS said the new product learns from these annotations in real time and can automatically apply labels to much of the remaining dataset, reducing the need for human review and cutting costs by up to up to 70 percent.

AWS Marketplace for Machine Learning gives developers access to a broad set of capabilities, includes more than 150 algorithms and models that can be deployed directly to Amazon SageMaker.

Amazon SageMaker RL is designed to ease the learning curve and expense involved in implementing reinforcement learning, which trains models without large amounts of training data. According to AWS, the product is the cloud’s first managed reinforcement learning service, allowing developer to deploy through managed reinforcement learning algorithms, support for multiple frameworks (including Intel Coach and Ray RL), multiple simulation environments (including SimuLink and MatLab) and integration with AWS RoboMaker, AWS’s robotics service, which provides a simulation platform that integrates well with SageMaker RL.

Amazon SageMaker Neo is a deep learning model compiler that lets customers train models and run them on any platform with up to 2X improvement in performance, according to AWS. The product compiles models for specific hardware platforms, optimizes their performance automatically, the company said, \eliminating hand tuning of models for different hardware platforms. SageMaker Neo supports hardware platforms from NVIDIA, Intel, Xilinx, Cadence, and Arm, along with frameworks such as TensorFlow, Apache MXNet and PyTorch. AWS will also make Neo available as an open source project.

Amazon Textract automates extraction of data from documents and forms and is, according to AWS, superior to existing optical character recognition (OCR) software, often inaccurate and typically produces output requiring extensive corrections, the company said. Amazon Textract uses machine learning to read documents and extract text and data without needing manual review or custom code, according to AWS. The product lets developers automate document workflows, processing millions of document pages in a few hours.

Amazon Personalize is a real-time recommendation and personalization service based on the ML technology used for’s retail interactions. Amazon Personalize is a managed service for building, training and deploying custom personalization and recommendation models. It makes recommendations, personalizes search results and segments customers for direct marketing through email or push notifications.