Advanced Computing in the Age of AI | Wednesday, February 8, 2023

Tachyum Successfully Runs LINPACK on FPGA with IEEE 754-2019 Compliant FPU 

LAS VEGAS, Nov. 29, 2022 – Tachyum continues to advance towards production-ready status of its universal processor after reaching its latest milestone of running LINPACK benchmarks using Prodigy’s Floating-Point Unit (FPU) on a Field Programmable Gate Array (FPGA). This was achieved by running applications under Linux on the integer part of the processor and uses IEEE compliant Floating-Point Unit (FPU) to analyze and solve linear equations and linear least-square problems.

The vector unit includes copies of 16 Floating-Point Units (FPUs) and additional shuffle and reduction operations. While there are many instructions to test in a vector unit, the Floating-Point vector operations are the hardest part of a vector unit, and that part is now successfully behind Tachyum’s product development team.

LINPACK measures a system’s floating-point computing power by solving a dense system of linear equations to determine performance. It is a widely used benchmark for supercomputers, including the NSCC Slovakia Supercomputer. After successfully reaching this FPU milestone, Tachyum has only four more steps to go before the final netlist of the Prodigy processor chip. The next milestone is running UEFI and boot loaders loading Linux on the FPGA, completing vector-based LINPACK testing with I/O, followed by I/O with virtualization, RAS (Reliability, Availability and Serviceability). Afterwards, Prodigy will be ready for final netlist, followed by tape-out.

Tachyum built its FPU from the ground up and is one of the most advanced in the world at the highest clock speeds. The company’s FPU includes FMA, divider, format converter, reciprocal approximator, reciprocal square root approximator and square root approximator. Its FPU is fully IEEE compliant and corner cases have been successfully debugged. In addition to IEEE single and double precision, the Prodigy processor will also support 16-bit Bfloat16 (Brain Floating Point).

The next milestone to be achieved is running vector operations, including mask operations and operations of unaligned vectors. The vectorization in the compiler reaching the production stage and vectorizing compilers and vectorized libraries will be fully available before chip shipments next year.

“Despite having to overcome obstacles of replacing IP and EDA tools, our engineering team has risen to the challenge of advancing the Prodigy stack so that we can get to tape-out and production next year,” said Dr. Radoslav Danilak, founder and CEO of Tachyum. “We have taken every opportunity to develop Prodigy as a processor that does not simply meet expectations but exceeds them. Successfully running LINPACK means that we are one step closer to completing our vision of transforming data centers into Universal Computing Centers with Prodigy.”

Prodigy delivers unprecedented data center performance, power, and economics, reducing CAPEX and OPEX significantly. Because of its utility for both high-performance and line-of-business applications, Prodigy-powered data center servers can seamlessly and dynamically switch between workloads, eliminating the need for expensive dedicated AI hardware and dramatically increasing server utilization. Tachyum’s Prodigy integrates 128 high-performance custom-designed 64-bit compute cores, to deliver up to 4x the performance of the highest-performing x86 processors for cloud workloads, up to 3x that of the highest performing GPU for HPC, and 6x for AI applications.

A video demonstration of successfully running LINPACK and some QA regression tests for FPU can be found below.


Source: Tachyum

Add a Comment

EnterpriseAI