Samsung Electronics Showcases Award-Winning Machine Translation at WMT
Jan. 30, 2023 -- At the Workshop on Machine Translation (WMT), one of the biggest events for machine translation research, Samsung Electronics joined the ranks of researchers from all over the world to discuss new and innovative ways to understand the human language using machines and computer programs.
Samsung Research and Samsung R&D Institute Poland (SRPOL) participated in a competition between scientific groups and laboratories to compare the quality of their translation tools. Teams from all over the world participated in the eight machine translation task competitions, from those representing widely known companies to research groups from various universities.
Samsung Research Global AI Center’s Language Lab participated in the Biomedical Translation task, which aims to evaluate systems for translating sentences from the biomedical domain. The task addressed a total of 14 language pairs, including English, French, German and Spanish. The team won first prize for effectively translating two language pairs: English → Spanish and Spanish → English. This was a particularly impressive feat due to the biomedical field’s frequent use of domain terminology.
In the case of domain-specific translation, one of the big factors that determine translation quality is terminology translation. Even with the same word, the translated word may vary depending on the domain, and compared to general terms, technical terms are used less frequently, making it difficult to learn. Considering these limitations, Samsung Research Global AI Center’s Language Lab improved domain-specific translation performance by incorporating soft-constrained terminology translation, which provides the terminology constraints of the target language as input with source sentences like a hint, and improved the domain terminology to be reflected in translation results as much as possible. Currently, Samsung Research is conducting research on domain-specific translation, including providing patent translation service (Korean—English) on Samsung Research’s translation service, SR Translate.
SRPOL also participated in two General Machine Translation tasks, achieving high ranks by placing second for English → Russian and English → Croatian.
During the competitions, WMT only provides teams with a limited amount of corpora, collections of structured texts, to be analyzed for their translation model. Therefore, the SRPOL team attributed their success to focusing on improving the quality of corpora through processes like data preprocessing and filtering. In addition, the team focused on optimizing their model’s architecture and AI training process.
Using the improved corpus, SRPOL’s Machine Translation Team built a classifier using a machine learning framework called BERT (Bidirectional Encoder Representations from Transformers). This classifier successfully categorized millions of sentences from the corpus into different domains. As a result, SRPOL was able to create models for not only general translation but also medical and legal.
SRPOL has been performing well in the field of machine translation, winning the challenges at the International Workshop on Spoken Language Translation (IWSLT), one of the world’s longest-running workshops on automatic language translation, for four consecutive years from 2017 to 2020.
Now more than ever, the goal of attaining a human-like level of language understanding seems to be within our grasp. As machine translation and language understanding slowly become an integral part of our everyday lives, Samsung will stay at the forefront of this technology to design the tools to overcome language barriers and improve your daily life.