Intel Launches Silicon Photonics Chip, Previews Next-Gen Phi for AI
At the Intel Developer Forum in San Francisco this week, Intel Senior Vice President and General Manager Diane Bryant announced the launch of Intel’s Silicon Photonics product line and teased a brand-new Phi product, codenamed “Knights Mill,” aimed at machine learning workloads.
With the introduction of Silicon Photonics, Intel is debuting two new 100G optical transceivers. Sixteen years in the making, the small form-factor design fuses optical components with silicon integrated circuits to provide 100 gigabits per second over a distance of two kilometers. Initial target applications include connectivity for cloud and enterprise datacenters as well as Ethernet switch, router, and client-side telecom interfaces. Microsoft is adopting the technology for its scale-loving Azure datacenters.
“Electrons running over network cables won’t cut it,” said Bryant in her keynote address, “Intel is the only one to build the laser on silicon and therefore we are the first to light up silicon. We integrate the laser light emitting material, which is indium phosphide onto the silicon, and we use silicon lithography to align the laser with precision. This gives us a cost advantage because it is automatically aligned versus manually aligned as with traditional silicon photonics.”
The two QSFP28 optical transceivers, now shipping in volume, are based on industry standards at 100G for switch, router, and server use, notes Intel. The 100G PSM4 (Parallel Single Mode fiber 4-lane) optical transceiver features up to 2 kilometer reach on parallel single-mode fiber and the 100G CWDM4 (Coarse Wavelength Division Multiplexing 4-lane) optical transceiver offers up to 2 kilometer reach on duplex single-mode fiber.
The first Intel Silicon Photonics products will fulfill the need for faster connections from rack to rack and across the datacenter, said Bryant. “As the server network bandwidth increases from 10 Gig to 25 Gig to 50 Gig, optics will be required down into the server as well. We see a future where silicon photonics, optical I/O is everywhere in the datacenter and then integrated into the switch and the controller silicon. Our ability to run optics on silicon gives the end user a compelling benefit.”
Kushagra Vaid, general manager for Micrsoft Azure Cloud hardware engineering, emphasized the need to keep up with continued growth in its datacenter, especially relating to cloud networking. “Back in 2009 the server bandwidth used to be around a GB/sec, and if you fast forward to later this year into early next year, we anticipate it to be around 50 GB/sec, so that’s a growth of 50 times on bandwidth to the server. As the server data rates increase, from 1 to 10 to 25 Gbps, when we start getting to 100 Gbps to the server, you will hit a brick wall. There is no way copper can scale beyond 100 Gbps. It is already getting difficult to scale copper at 25 Gbps over 3 meters, so we do need some new technologies that are going to be used for this scaling. That’s why Silicon Photonics is very interesting to us.”
Microsoft will initially be deploying Intel’s Silicon Photonics technology for switch-to-switch interconnectivity at 100 Gbps in its Azure datacenter. “We found it’s a great cost-effective way to do these deployments,” said Vaid. “It’s optimized versus what we are doing today and I think the best part is it gives us a mechanism to scale to even higher bandwidth — up to 400 Gbps in the near future.”
Intel Puts AI-focused ‘Knights Mill’ on Phi Roadmap
Bryant also revealed that the next-generation Xeon Phi product would not be the 10nm “Knights Hill” that we’d been expecting but rather a brand-new Phi entry, codenamed “Knights Mill” and optimized for deep learning. The surprise Phi product will feature AI-targeted design elements such as enhanced variable precision compute and high capacity memory.
Like its second-gen cousin “Knights Landing,” the third-generation Phi is also a bootable host CPU. “It maintains that onload model,” said Bryant, “but we’ve included new instructions into the Intel instruction set – enhancements for variable precision floating point so the result is you will get even higher efficiency for deep learning models and training of those models complex neural data sets.”
Intel’s move to optimize for single-precision (and likely half-precision) follows the same path that NVIDIA started when it launched the highly FP32-optimized Titan X at its 2015 GTC event. Pascal, debuted at GTC16, is the company’s first high-end GPU to feature mixed-precision floating point capability, meaning the architecture will be able to process FP16 operations twice as quickly as FP32 operations. While double-precision FLOPS are standard fare in HPC, machine learning typically does quite well with single or half-precision compute.
There is still a lot we don’t know about Knights Mill, such as what manufacturing process it will use and whether it replaces Knights Hill, the chip that is supposed to power Argonne Lab’s CORAL installation in the 2018 timeframe. Bryant didn’t indicate if or how the new chip would affect previous disclosures, but emphasized Intel’s commitment to “a very long roadmap of optimized solutions for artificial intelligence.”
The War for AI Dominance
With the launch of both Nvidia Pascal GPUs and the Intel Knights Landing Phi this year, there’s a battle brewing between the reigning GPU champ and Chipzilla for AI supremacy with the most recent shot being fired by NVIDIA this week in the form of a blog post contesting performance claims made by Intel. Intel said it stands by its numbers.
During Bryant’s keynote, representatives from Chinese cloud giant Baidu and machine learning startup Indico took to the stage to sing the praises of Xeon and Xeon Phi for machine learning workloads. In one exchange Indico founder Slater Victoroff noted that “the issue with that is once you move to thousands of models, GPUs don’t make sense anymore.” “I certainly like the idea of GPUs not making sense,” Bryant quipped back.
Baidu provided an even heftier endorsement. The company, which has relied heavily on NVIDIA GPUs to run its deep learning models, announced that it will be using Xeon Phi chips to train and run Deep Speech, its speech recognition service.
“We are always trying to find ways to train neural networks faster,” said Baidu’s Jing Wang. “A big part of our approach is our use of techniques normally reserved for high-performance computing and that has helped us achieve a 7X speedup over our previous system. When it comes to AI, Intel Xeon Phi processors are a great fit in terms of running our machine learning networks. The increased memory size that Intel Phi provides makes it easier for us to train our models efficiently compared to other solutions. We find Xeon Phi very promising and consider performance across a wide range of kernel shapes and sizes relevant to the state-of-art along short-term memory models.”
Baidu also announced a new HPC cloud service, featuring Xeon Phis. “The Xeon Phi-based public cloud solutions will help bring HPC to a much broader audience,” said Wang. “We think it will mean not only lower cost but greater velocity of HPC and AI innovations.”
Bryant observed that machine learning is also a prime workload at government and academic high-performance computing centers. Increasingly, researchers are applying machine learning to what are traditional data-intensive science problems. At NERSC, the DOE computing facility where the Knights Landing-based Cori machine is currently being installed, Intel is partnering with researchers to advance machine learning at scale. Together, she said, they’ll tackle “previously unsolved problems that require the entire Cori supercomputer for challenges such as creating a catalogue of all objects in the universe.”
The final AI note hit by Bryant was Intel’s planned acquisition of Nervana Systems, announced last week. “Their IP as well as their deep expertise in accelerating deep learning algorithms will directly apply to our advancements in artificial intelligence,” said Bryant. “They have solutions at the silicon levels, at the libraries and at the framework level.”