Microsoft’s AI Supercomputer on Azure: ‘Combinations of Perceptual Domains’
Microsoft has unveiled a supercomputing monster – among the world’s five most powerful, according to the company – aimed at what is known in scientific and philosophical circles as the “emergent property,” or at least a version of it. In AI, this means taking a step beyond “narrow AI,” the current state of things in which AI doing a single task, and moves toward AI handling more than one task or problem simultaneously. If the emergent property begins to kick in, then AI could move into new realms of capability (see “Dr. Eng Lim Goh on Superhuman AI and the ‘Emergent Property’”).
Hosted on Microsoft’s Azure public cloud and built with and for OpenAI (an AI research lab co-founded by Elon Musk, among others, with a “AI for humanity” charter) the system is intended for training larger AI models targeting highly complex problems and is, Microsoft said in a blog, “a first step toward making the next generation of very large AI models and the infrastructure needed to train them available as a platform for other organizations and developers to build upon.”
“The exciting thing about these models is the breadth of things they’re going to enable,” said Microsoft CTO Kevin Scott. “This is about being able to do a hundred exciting things in natural language processing at once and a hundred exciting things in computer vision, and when you start to see combinations of these perceptual domains, you’re going to have new applications that are hard to even imagine right now.”
Launched at the company’s annual Build conference, Microsoft said the supercomputer is a single system with more than 285,000 CPU cores, 10,000 GPUs and 400 gigabits/second of network connectivity for each GPU server in the cluster. While the company did not release specific throughput numbers, Microsoft said “compared with other machines listed on the Top500 supercomputers in the world, it ranks in the top five.” If accurate, that would indicate a machine capable of greater than 23.5 (double-precision, Linpack) petaFLOPS. Microsoft said the system was completed late last year. Details regarding systems suppliers were not disclosed, but if it's assumed the machine's 10,000 GPUs are Nvidia V100s, each one delivering 7.8 double-precision teraFLOPS, then that would be sufficient to crack the top five of the Top500.
“As we’ve learned more and more about what we need and the different limits of all the components that make up a supercomputer, we were really able to say, ‘If we could design our dream system, what would it look like?’” said OpenAI CEO Sam Altman. “And then Microsoft was able to build it.”
In AI today, data scientists typically build separate, relatively limited models that use labeled data to learn individual tasks, such as language translation, image recognition or document classification. But researchers are building a new class of larger models that can handle those tasks learned by “examining billions of pages of publicly available text, for example,” Microsoft said. “This type of model can so deeply absorb the nuances of language, grammar, knowledge, concepts and context that it can excel at multiple tasks: summarizing a lengthy speech, moderating content in live gaming chats, finding relevant passages across thousands of legal files or even generating code from scouring GitHub.”
Researchers at Microsoft, under the company’s AI at Scale initiative, have developed larger Microsoft Turing models for natural language processing (NLP) , used within the company’s Internet search, Office and ERP/CRM products. Turing Natural Language Generation (T-NLG) is a 17 billion-parameter (each one loosely equivalent to a synaptic connection in the human brain) language model that, according to Microsoft, performs tasks such as writing assistance and answering reader questions.
Eventually, the company intends to open source its large AI models, training tools and supercomputing resources available through Azure AI services and GitHub. “…AI is becoming a platform,” Scott said. “This is about taking a very broad set of data and training a model that learns to do a general set of things and making that model available for millions of developers to go figure out how to do interesting and creative things with.”
Microsoft also announced a new version of DeepSpeed, an open source deep learning library for PyTorch that the company said enables training of models 15x larger and 10x faster. And the company added support for distributed training to the ONNX Runtime, an open source library designed to make models portable across hardware and operating systems. To date, ONNX Runtime was focused on high-performance inferencing.
The combination of these tools, frameworks and computing infrastructure are intended to enable “self-supervised” learning, which Microsoft said are AI models that can be trained using large amounts of unlabeled, unstructured data by absorbing large volumes of pages of publicly available documents on the Internet and predicting missing words and sentences. Removing the task of meticulously labeling data (such as labeling pictures of cats) would immeasurably improve the lives of data scientists and their teams of workers.
“In something like a giant game of Mad Libs, words or sentences are removed, and the model has to predict the missing pieces based on the words around it,” Microsoft said. “As the model does this billions of times, it gets very good at perceiving how words relate to each other. This results in a rich understanding of grammar, concepts, contextual relationships and other building blocks of language. It also allows the same model to transfer lessons learned across many different language tasks, from document understanding to answering questions to creating conversational bots.”
Also at the Build conference, Microsoft previewed Project Bonsai, Microsoft's machine teaching service for autonomous industrial control systems. The company said the project is intended to enable subject matter experts without AI backgrounds to develop physical systems and processes via “machine teaching,” which enables machines “to incorporate knowledge from experts rather than extracting knowledge from data alone.”
“Through machine teaching, developers and engineers can specify desired outcomes or behaviors, concepts to be taught, and safety criteria that must be met,” the company said. “The machine teaching approach enables users to have a clear understanding of how the AI agents work and debug when they don't.”
Tiffany Trader contributed to this report.