How Nvidia Sees ‘Reasoning’ Emerging in Natural Language Processing
When machines can truly think for themselves, one of the greatest hopes for AI will be realized.
At Nvidia, where massive amounts of compute power and a wide range of GPUs and related hardware and software are regularly being engaged for this work, recent experiments with language translation are showing fascinating results and promise to help reach that elusive goal.
Bryan Catanzaro, Nvidia’s vice president of applied deep learning research, recently made a presentation at the AI Hardware Summit where he talked about how his company is pushing the frontiers of natural language processing (NLP) forward when it comes to language translation.
“Language modeling is an example of the kinds of new thoughts that people are thinking [and] it is enormously important commercially at many different companies,” said Catanzaro. “We are seeing exploding model complexity in so many different areas of language modeling,” with the rate of complexity doubling every two months today. “We are on track to have 100 trillion parameters, single models, by 2023.”
To do this work, the compute required to train these models is staggering, with the potential for costs of millions of dollars to as much as $1 billion for training alone.
“Now, why would somebody train a model that is so expensive?” he asked. “Because these language models are our first steps towards generalized artificial intelligence, you know, with few-shot learning. And that is enormously valuable and very exciting.”
Building a model that could take $1 billion to train it would essentially mean reinventing an entire company, and that model would have to be usable in many different contexts to make it worthwhile, he said.
To illustrate his reasoning, Catanzaro took an example of translation from a large language model and experimented by inserting the sentence “I live in California” in English. He quickly received its Spanish translation back from the model, “Yo vivo en California.”
In a prompt, he was going to ask the model to do a translation, but the model did it on its own before he asked. “It actually did the proper translation. And it was not just a word for word translation, either. If you look closely, why is this so astonishing is because this language model was not trained to do translation at all.”
Instead, what the model was trained to do before he used it in this way was to predict the next word in a sequence of text. The model had been previously trained on an enormous amount of data from the internet, he said.
To do what it did was remarkable, said Catanzaro.
“The model, in order for it to learn the task of predicting the next word, needed to start understanding various high level concepts, like the fact that there is an English language, that there is a Spanish language and that they have vocabulary that is related,” he said. “For example, the word 'live,’ it looks a little bit different in English and in Spanish, but somehow the model knows that those are the same concepts.”
As amazing was that the model “grasped the idea that it could translate from English to Spanish,” he said. “Somehow, the model had to learn that.”
“And when we think about that, it's actually kind of staggering to think that a model that was just trained on a sequence of words could learn all those concepts,” said Catanzaro “And that is extraordinarily exciting because it is a step towards generalized artificial intelligence. And the reason this is so exciting is because all human activity, all of human ingenuity and wisdom has been encoded in language.”
What the model displayed was notable, he said. “It is a general form of reasoning that we have never had before, and that is very valuable and very exciting.”
The results provide promise for the concepts of NLP and AI and how they can advance society and the world, he said.
“These are the kinds of capabilities that are generating such investment in large scale language modeling,” said Catanzaro. “I think we are going to see continued investment just because the prospects are so great.”
This is the kind of technology advancements that Nvidia has been working toward in its almost 30 years of existence, he said.
“The core of the work we do involves optimizing hardware and software together, all the way from chips to systems to software, frameworks, libraries, compilers, algorithms and applications,” said Catanzaro. “We want the inventors, the researchers and the engineers that are coming up with future AI to be limited only by their own thoughts. That is the dream.”