Esperanto Now Testing Its Fledgling 1,092-Core RISC-V ET-SoC-1 Chips
Back in December, Esperanto Technologies made waves when it announced ET-SoC-1, a new RISC-V-based chip aimed at machine learning that packed nearly 1,100 cores onto a package small enough to fit six times over on a single PCIe card. Now, early ET-SoC-1 sample chips have been delivered to the company from its fab and testing is being done as Esperanto looks to take aim at the competition in the future with bold estimates of the ET-SoC-1’s energy efficiency.
First, a refresher: ET-SoC-1 can operate either as a standalone processor or as a PCIe-driven accelerator. At Hot Chips 33 this week, Esperanto Technologies founder and executive chairman Dave Ditzel called it “the world’s highest-performance commercial RISC-V chip.”
“On one 7nm chip,” he said, “we put 1,088 energy-efficient ET-Minion RISC-V processors, each with its own vector/tensor unit; four high-performance ET-Maxion RISC-V processors; over 160 million bytes of on-chip SRAM; and interfaces for external DRAM and flash memory.”
The chip is aimed at machine learning recommendation workloads, which Ditzel said were among the most important workloads for datacenters due to their increasing popularity coupled with their demanding performance memory requirements. Most of these datacenters, Ditzel explained, were running x86 servers with open PCIe slots, leaving Esperanto an opening to enter existing datacenters through a high-performing PCIe card.
Performance maximization through energy efficiency
Operating with existing datacenters meant adhering to a set of requirements. Perhaps most importantly, the PCIe slots have a limited power budget – around 75 to 120 watts. “Esperanto’s challenge was how to put the highest recommendation performance onto a single PCIe-based accelerator card using no more than six of our chips and no more than 120 watts,” Ditzel said. This left Esperanto at a fork in the road.
“Some of the other solutions use one giant hot chip that uses up the entire power budget of the accelerator card,” Ditzel said. “Single-chip solutions often push for the highest operating frequencies, but this comes with very high power and it’s not very energy efficient. Esperanto realized that transistors – particularly 7nm FinFETs – are much more energy-efficient when operated at low voltages.”
Ditzel illustrated this point with a graph showing the energy efficiency of ET-SoC-1 (described in terms of inferences per second per watt) across different operating voltages.
At the highest operating frequency (0.9 volts), Ditzel said, just one of their chips could use 275 watts – well beyond the power budget of a PCIe slot. At 0.67 volts, they could come in just under the power budget.
But if, instead, they followed the peak of the energy efficiency graph, the entire chip would consume just 8.5 watts and the six chips would fit in a 120-watt budget with “room to spare.” With six chips, the performance would be two and a half times better than the one-chip solution, and the energy efficiency twenty times better than the 275-watt scenario.
Of course, they’d want to use up the entire power budget – and operating at about 0.4 volts, Ditzel said, one chip would take about 20 watts. Using six chips at that voltage yielded four times better performance than the one-chip solution.
Squaring up to the competition
Of course, Esperanto is facing down an increasingly crowded field of accelerators. So: after all this work on efficiency, how does ET-SoC-1 actually stack up?
For that, Esperanto turned to the MLPerf deep learning recommendation model benchmarks for the Intel Xeon Platinum 8380H 8S (an example of a server chip) and two Nvidia GPUs, the A10 and the T4. With respect to the ET-SoC-1, Ditzel said that “all the performance [metrics] for Esperanto shown so far are projections based upon gate-level simulations of the entire chip and a large Synopsys ZeBu hardware emulation system.”
Using the Xeon chip as the baseline, Esperanto presented comparisons of the relative performance and relative performance per watt of the four processors. The T4 operated at 11 times the performance and 39 times the performance per watt of the Xeon chip; the A10, at 31 times and 52 times, respectively; the ET-SoC-1, 59 times and 123 times.
Using the ResNet-50 inference benchmark, the processors showed similar results, with the ET-SoC-1 again arriving in first place with 15.4 times the performance and 25.7 times the performance per watt of the Xeon chip.
Unsurprisingly, Esperanto is particularly interested in its projected performance-per-watt dominance.
“Our position is that performance-per-watt is actually a better metric of performance, since everyone ought to be measured at similar power usage,” Ditzel said.
Esperanto has big ambitions: in his presentation, Ditzel talked about how, using open-source Glacier Point designs, a single large datacenter could contain millions of ET-SoC-1 chips. For now, though, the Mountain View-based company is still laying its foundation. “At the time of this recording, we’ve recently received silicon in our labs and are in the process of bringing up chip testing,” Ditzel said, showing off the first 7nm ET-SoC-1 chip mounted in the package.
This article first appeared on sister website HPCwire.