Advanced Computing in the Age of AI | Monday, June 24, 2024

Needed for Voice-Assisted Technologies: Context 

Modern voice-assistant technology has only been in the market inside of a decade but in that time, has quickly become a ubiquitous example of recent advances in the field of human-digital interaction. Whether you choose Apple’s Siri, Amazon’s Alexa, or Google Assistant, voice command technology is everywhere and its popularity will only continue to grow as new use cases emerge.

The line between the digital and physical worlds continues to blur, and people are demanding new ways to interact with digital technology without being tethered to small touch screens and laptops. In fact, Gartner predicts that by 2020, about 30 percent of all web-browsing sessions will take place without the use of any screen at all.

In recent years, smart devices have increased steadily, with everything from toilets to ovens boasting connectivity. These innovations have the potential to improve productivity, safety, and enjoyment in our everyday lives. However, while voice command technology can handle simple tasks, back-and-forth interaction falls decidedly short of the human dialog experience. We talk to Siri and Alexa, we say “hey” to Google, but how much do they understand?

To enable more intuitive digital interactions, a few companies are experimenting with combinations of voice with other technologies, such as spatial awareness, 3D mapping, and gesture recognition. The idea is that by blending these innovations together, perhaps we could get a bit closer to a more human experience. By creating opportunities to interact with the digital world as we would with a friend – for example, referencing objects and locations around users in a manner that taps into expressions, emotions, and physical traits and is less restrictive than voice recognition technology on its own – a we can get a few steps closer to interacting more naturally with our connected devices.

For example, Anki’s home robot Vector can recognize people and intuitively interact with its environment through sight and sound. In addition, Norway’s Ellipic Labs’ Inner Reflection  delivers presence-sensing capabilities to any device that possesses a speaker, microphone and audio processer, empowering devices to take actions, such as responding automatically when a user enters a room and adjust volume based on user distance.

While these technologies take voice-command technology a few steps closer to human verbal exchanges, blending machine vision and spatial awareness with voice improves many other use cases as well. For example, in an office environment, a natural user interface could control conference room technology and enable automatic meeting minutes with attribution to the correct speakers, or intuitive gesture and voice control of a videoconferencing camera, allowing commands such as “zoom in on that whiteboard.”

Industrial settings also emerge as a great environment to leverage this technology. In an industrial workspace, a natural user interface can improve safety and productivity. For example, a factory worker could control heavy equipment without sharing their attention with a display or control panel or needing to remove protective gloves, while automated equipment could anticipate the movement of workers to enhance safety.

In a medical environment, a natural interface could capture information from visits and examinations in real time based on patient record information synthesized with voice and procedure recognition to automate physicians filling out paperwork.  In a surgical environment in which sterility limits touch interfaces, natural interfaces could synthesize voice commands and gestures to more effectively support surgeons.  In a hospital, a natural interface could give patients more control over the lighting and entertainment in their room, reducing dependence on a human assistant for minor tasks.

The future for digital assistant technology is, without a doubt, increasing contextual awareness. The more closely we can mimic human dialogue, including all of the context of location, emotion, gestures, and more, the more useful digital assistants will become – and we’re not far off from achieving that goal. Whether in the home or on the assembly line, natural user interface technology represents a dynamic shift in how we interact and work alongside technology.

Jeff Hebert is vice president of engineering, Synapse Product Development.