Advanced Computing in the Age of AI | Sunday, May 26, 2024

Intel’s 30x AI Performance Aim for Xeon Sapphire Rapids CPUs May Not Solve All AI Needs: Analysts 

With its upcoming Intel Sapphire Rapids CPUs, designed as the next generation of Intel Xeon CPUs after Ice Lake and slated for release in 2022, chipmaker Intel Corp. is hoping to drive AI performance to new heights – as much as 30x the performance of the existing Ice Lake chips.

The company announced the 30x AI performance boost goal with a demonstration at its Intel Innovation virtual event on Oct. 27, unveiling a series of technical refinements that allow the upcoming chips – which are codenamed Sapphire Rapids – to gain increased capabilities for AI workloads, according to the company.

The 30x gains in the pre-production Xeon Sapphire Rapids chips came about through a series of steps which take advantage of existing architecture in the processors, such as harnessing the built-in Advanced Matrix Extensions (AMX) engine, taking advantage of the on-board Intel Neural Compressor (INC) and integrating Intel oneDNN optimizations that are based on the oneAPI open industry standard, Jordan Plawner, Intel’s AI product director, told EnterpriseAI.

Jordan Plawner of Intel

“We take the position that AI will be everywhere,” said Plawner. “Anything that processes will need to do AI. So, who has the best single-socket performance or performance-per-watt is not the only metric or KPI, because if you are doing AI everywhere, then you have a broad range of use cases. So, the Xeon AI strategy is to empower customers to run AI everywhere there is a Xeon.”

Intel aims for the 30x target in a measured way, using multiple improvements, said Plawner. “We did not get to a 30x [performance boost] in one step,” he said.

The Updates Behind the Performance Boosts

Instead, the first boosts came from improvements made to the in-production Xeon Ice Lake processors, he said. That was accomplished by upgrading from the default Intel version of the TensorFlow application to the next-generation of TensorFlow from Google, which is the mainstream version used by many millions of users, said Plawner. Intel also integrated Intel’s oneAPI open, standards-based, cross-architecture programming model from the company’s oneDNN deep neural net library – which is an open-source performance library for deep learning applications – natively into TensorFlow for more gains.

“So, anyone who downloads TensorFlow 2.5 or later, automatically gets all the hardware optimizations on for Xeon,” said Plawner. “That is the first 1.5x [performance boost] that we accomplished, and that is on existing hardware, just by upgrading to the newest version.”

Next was the release of a tool called INC, the Intel Neural Compressor, which simplifies the process of taking a 32-bit model and turning it into an 8-bit model, which is more efficient, said Plawner. The Intel Neural Compressor automatically optimizes trained neural networks with negligible accuracy loss, going from FP32 to int8 numerical precision, taking full advantage of the built-in AI acceleration – called Intel Deep Learning Boost – that is in today’s latest production Intel Xeon scalable processors.

“The challenges for developers and data scientists are they do not want to lose precision,” he said. “The tool helps the data scientists set a precision target, an accuracy target. This tool helps them get rid of the exponent, but then not lose accuracy. This is done a lot by the cloud service providers because they have the most sophisticated data scientists.”

Also contributing to the 30x AI improvements is the inclusion of the Intel AMX instruction set (Advanced Matrix Instructions) in the Sapphire Rapids pre-production software stack, as well as more cores, more memory and more throughput, said Plawner.

Adding all the improvements together gets the chips to that 30x AI improvement by 2022, he said.

“That last bit is available out of the box to any developer, because we want to enable customers to run AI everywhere there is a Xeon,” said Plawner. “They cannot spend more than five minutes getting it to work. Data scientists have no patience.”

Giving Xeon More Capabilities in the AI Marketplace 

Even with all the boosts, Intel does not assume that customers will always choose a Xeon processor, he added. “But we want to give them no out-of-the-box excuse to not use Xeon. The counter of that is that out-of-the-box AI should just run on Xeon period.”

All of this is to better serve customers and their SLAs, he said.

Customers have varying needs as they run their applications on Xeon, said Plawner. “They want to use some number of cores just to do their inferencing. And our goal is to make sure that most customers, not 100 percent, can use some number of cores, run their inferencing in place with the rest of the application on Xeon, hit their SLAs and move on so they do not need to go and buy an accelerator.”

According to Intel’s demo at Intel Innovation, the improved Sapphire Rapids processors achieved more than 24,000 images per second on ResNet50, which exceeded the latest Nvidia A30 GPU at 16,000 images per second.

Based on the improvements, Intel argues that its Xeon general-purpose CPUs with built-in AI acceleration can solve customer use cases that once necessitated GPU acceleration, according to the company.

Better Xeons Won't Replace GPUs Everywhere, Say Analysts

Several IT analysts said that Intel’s AI performance goals for its next processors are laudable, but that they may not allow CPUs to solve every customer need over GPUs.

Linley Gwennap, analyst

“Sapphire Rapids alone delivers a big gain in AI performance,” Linley Gwennap, principal analyst for The Linley Group, told EnterpriseAI. “Using the new AMX instructions, each CPU can deliver about 8x more AI operations than current Intel CPUs. Intel Neural Compressor (aka Low Precision Optimization Tool) offers a gain of 3.73x when porting from FP32 to INT8 data types. Multiply those two numbers together and Intel hits the 30x target once Sapphire Rapids begins shipping.”

But in the real world, things are different, said Gwennap. “To put this in perspective, however, most AI workloads, including those running on current Xeon processors, already use INT data, so erase the 3.73x. The 8x gain is impressive, but the best Xeon processors today deliver only 5 percent of the performance of the best Nvidia GPUs on ResNet-50, so that means Sapphire Rapids will still be less than half as fast.”

In addition, “Intel compares a top-of-the-line Sapphire Rapids – likely costing more than $10,000 and burning 400W – against an Nvidia GPU that costs $5,000 and uses 165W,” said Gwennap. “Mainstream Sapphire Rapids models will likely achieve around 8,000 images per second.”

Overall, Sapphire Rapids will “bring an impressive 8x improvement in AI performance above the weak baseline of current Xeon processors,” said Gwennap. “The next-generation product will deliver reasonable performance for customers that have light AI workloads, occasional AI use, or applications that mix AI and general-purpose computation. But customers with large neural networks or frequent AI use will significantly improve throughput and reduce operating cost (TCO) by installing a GPU or other AI accelerator in their servers.”

Karl Freund, analyst

Another analyst, Karl Freund, principal analyst and founder of Cambrian AI Research, agreed.

“First, the good news for Intel – yes, I believe the 30X is in the ballpark for some workloads and models, thanks largely to the upcoming AMX matrix extensions … that should have a huge benefit for apps that current do dot products in single instruction, single data code,” said Freund. “And AMX is configurable in terms of the matrix/vector size, which is good.”

But that does not mean that customers will use it, he said.

“I do not think these extensions will enable CPUs to compete with GPUs in general,” said Freund. “After all, Intel is pouring a great deal of silicon and money into their own Ponte Vecchio GPU for HPC and AI. And you still will not see a lot of training being done on these new Xeons. But Intel continues to add value to the AI workloads by continually providing more performance for operations common to AI algorithms/models. But you will not see workloads migrating off GPUs or other ASICs like Google TPU.”