Advanced Computing in the Age of AI | Wednesday, December 1, 2021

Proving the Art of the Possible with Natural Language Processing 
Sponsored Content by Dell EMC | Intel

The Dell EMC HPC and AI Innovation Lab with Intel is showcasing the art of the possible with deep learning for language-to-language translation and text-to-voice translation.

Natural language processing is a form of artificial intelligence that allows a computer application to understand human language, spoken or written. The concept of NLP encompasses coding, understanding, interpreting and manipulating language. NLP applications use computers to translate languages, convert voice to text and text to voice, and create human-like conversational agents to help customers, employees and others deal with questions and concerns.

In recent years, the field of NLP has been transformed by the shift from statistical machine learning methods to the use of neural networks and deep learning. With these approaches, it is now possible to build automated systems that can interact with people more naturally than ever before. And forward-looking businesses are seizing the day, incorporating NLP into a wide range of their processes for both customer-facing activities and internal operations.

To help organizations capitalize on this trend, Dell EMC and Intel® have been advancing the technologies and methodologies for the development of NLP applications. The team in the Dell EMC HPC and AI Innovation Lab in Austin, Texas has two key projects under way in this realm: One on language-to-language translation and the other on text-to-voice translation.

Language-to-language translation

In the Lab’s project focused on language-to-language translation, data scientists are working to solve key problems associated with translating from one human language to another using a neural network. This is a process that involves taking inputs from a source language and converting it to a target language.

In this process, the translation model first reads a sentence in a source language and then passes it to an encoder, which builds an intermediate representation. This intermediate representation is then passed to a decoder, which processes the intermediate representation to produce the translated sentence in the target language.

For the language-to-language translation project, the team started with a stock topology created by Google, and then improved some of the underlying mathematics, to parallelize the workflows more efficiently. The goal was to run the model on hundreds of compute nodes to get to a solution more quickly.

In this optimization process, which spanned several months, the team looked at how the system was using memory, performing computation and the accuracy of the results. The validation of the accuracy of the model gave the assurance that efforts to speed up the computation didn’t yield lower-quality answers.

Computing resources

For this project, the HPC and AI Innovation Lab team leveraged the Dell EMC Zenith supercomputer, built with PowerEdge servers with Intel’s 2nd Generation Scalable Xeon Processors. This TOP500 system, resulting from a partnership between Dell EMC and Intel®, serves as a benchmarking system for internal teams, as well as a resource for evaluations.

In addition, the Lab team leveraged the processing power of the Dell EMC Stampede2 supercomputer at the Texas Advanced Computing Center (TACC) at The University of Texas at Austin. This Intel-based system, which was ranked at No. 19 on the June 2019 TOP500 list, serves as a strategic national resource that provides HPC capabilities to thousands of researchers across the United States.

The team scaled the process of training models for language-to-language translation up to 512 nodes without negatively impacting on the quality of the results. This finding suggests that these models can now be trained at a much faster pace and at a much large scale without breaking the current state of the art.

Text-to-voice translation

Text-to-voice translation takes written words and converts them to audio. The objective is to generate a complete audio wave form synthetically — while not using the mechanized, clip recordings that have been used to hear on telephone systems for the last 20 years.

With these more advanced approaches, developers use training data that consists of a transcript and clips of a voice actor reading that transcript. These resources serve as the training foundation for the creation of a voice that a computer will mimic. The developers then train the neural network to produce a simulated voice that sounds extremely similar to actor’s voice.

For the text-to-voice translation project, the team used a two-part process, with two deep learning models:

  • They began by taking text and converting it to a spectrogram image, and that takes one deep learning model. This spectrogram image is a frequency distribution of the letters and sounds that are expected to be produced in the resulting voice.
  • The team then created a second model that takes the spectrogram and generates a complete audio waveform that uses a realistic synthetic voice of the actor that was used in the training process.


In this ongoing project, they are now working to accelerate the process of producing the audio waveforms.

The HPC and AI Innovation Lab work demonstrated the ability to create realistic voices, and that parallelization can complete the task in a relatively short period of time. The team reduced the process of producing a realistic voice model from more than a month to less than three days, just by parallelizing the process on a supercomputer and leveraging Intel® software optimizations.

Key takeaways

Natural language processing is a potentially powerful tool for enterprises and other organizations that want to streamline their interactions with customers, employees, partners and others. To help organizations capitalize on this opportunity, the Dell EMC HPC and AI Innovation Lab works to advance the technologies and methodologies for the development of language-to-language translation and text-to-voice translation applications.

To learn  more



Add a Comment