World’s First 1,000-Processor Chip Said to Show Promise Across Multiple Workloads
A blindingly fast microchip, the first to contain 1,000 independent processors and said to show promise for digital signal processing, video processing, encryption and datacenter/cloud workloads, has been announced by a team at the University of California, Davis. The “KiloCore” chip has a maximum computation rate of 1.78 trillion instructions per second and contains 621 million transistors, according to the development team, which presented the microchip at the 2016 Symposium on VLSI Technology at Circuits this month in Honolulu.
By way of comparison, if the KiloCore’s area were the same as a 32 nm Intel Core i7 processor, it would contain approximately 2300-3700 processors and have a peak execution rate of 4.1 to 6.6 trillion independent instructions per second, according to the design team.
Although Bevan Baas, professor of electrical and computer engineering at UC/Davis who led the chip architectural design team, told EnterpriseTech that there are no current plans to commercialize the processor, he said it has important commercial implications, with several applications already developed for the chip.
“It has been shown to excel spectacularly with many digital signal processing, wireless coding/decoding, multimedia and embedded workloads, and recent projects have shown that it can also excel at computing kernels for some datacenter/cloud and scientific workloads,” he said. He said the KiloCore innovates in a number of areas covering architectures, application development and mapping, circuits, and VLSI design.
“We hope multiple aspects of KiloCore will influence the design of future computing systems,” he said. “For workloads that can be mapped to its architecture, it could very well have a place in exascale-class computing.”
The design team claims for KiloCore the highest clock-rate processor ever designed in a university. And while other multiple-processor chips have been created, none exceed about 300 processors, according to the team.
The KiloCore chip was fabricated by IBM using their 32 nm CMOS technology.
Beyond throughput performance, Baas said KiloCore is also the most energy-efficient many-core processor on record. For example, the 1,000 processors can execute 115 billion instructions per second while dissipating only 0.7 Watts, low enough to be powered by a single AA battery. The KiloCore chip executes instructions more than 100 times more efficiently than a modern laptop processor.
Each processor core can run its own small program independently of the others, Baas explained, which he said is a fundamentally more flexible approach than Single-Instruction-Multiple-Data approaches utilized by processors such as GPUs. The idea is to break an application up into many small pieces, each of which can run in parallel on different processors, enabling high throughput with lower energy use.
The KiloCore architecture is an example of a “fine-grain many-core” processor array, Baas said. Processors are kept as simple as possible so they occupy a small chip area, with numerous cores per chip. “Short low-capacitance wires result in high efficiency,” he said, and “operate at high clock frequencies (high performance in terms of high throughput and low latency).” The cores dissipate low power when both active and idle – in fact, he said, they dissipate perfect zero active power when there is no work to do. Energy efficiency also is achieved by operation at low supply voltages and a relatively-simple architecture consisting of a single-issue 7-stage pipeline with a small amount of memory per core and a message-passing-based inter-processor interconnect rather than a cache-based shared-memory model.
Baas said the team has completed a compiler and automatic program mapping tools for use in programming the chip.