The future of voice recognition is… everywhere.
Voice recognition – if you were born before the year 2000 chances are you have at least one horror story of hours spent on the phone e-nun-ci-a-ting every syllable in the desperate attempt to communicate with the dismal excuse for a “robot” that was on the other end. And god forbid you had an accent – forget it – you may as well head into the store because that bot was never going to figure out what you were trying to say.
Fast-forward to present day and our voice recognition software has become so exceptional that many of us can’t imagine life without it (hello Siri). When it comes to the future of voice recognition, the consensus seems to be VR in everything. Keyboards and similar control panels will slowly be phased out of all devices as they gain the ability to simply listen to our commands. You can already use Siri and other voice recognition with our best waterproof bluetooth speakers.
For more on that and what else the future of voice recognition might look like, we asked a group of industry experts…
What’s The Future Of Voice Recognition?
Here’s what they had to say…
Chris Kirby, VP of Voices.com
“Voice recognition and the artificial intelligence and understanding behind it is only going to get more sophisticated going forward. Once we lock down basic content, I suspect efforts will move toward analyzing the characteristic of the voice utterance. I see a time when not only will the algorithms understand what is said, but the way it is said. Tonal inflection and all the other characteristics that add meaning to the spoken word will become part of the process of comprehension. This may be done to determine the mood of the speaker, whether they are in distress, or how strong or weakly they may believe in the statement. It could be used to adjust the response or, in the case of a security situation, whether a response is warranted. It may also lead to a means of diagnosing mental state. Ultimately, performance concepts like sarcasm would be identifiable and play a role in the response.”
Daria Evdokimova, CEO & Co-founder of VoiceOps
“In the next 5-10 years, it’s highly unlikely machine driven speech-to-text won’t surpass human transcription in both accuracy and speed, just given the current pace of development. We’re not there yet, but we will be shortly. That speed of development will also increase over the next few years as we continue to capture more voice data through in-home and mobile virtual assistant apps like Siri and Alexa.
The evolution we’ll likely see is the application of natural language processing (NLP) directly on audio data, as opposed to applying it to transcripts of audio data – and I think that’s a natural evolution. Humans interpret and digest audio data by recognizing how different sounds make up words, not by first converting speech to text and then reading that text.”
Rishi Khanna, CEO of ISHIR
“The sci-fi movies are turning real. We can already see that as we have started talking to devices. There have been major advances in speech recognition, supported by faster wireless speeds, and the phenomenal cloud computing growth, what can possibly stop the growth of voice assistants? As humans, we are an inquisitive lot and we wonder what next to Alexa, Siri and OK Google.
In the future, virtual assistants will dominate our day-to-day lives as voice will help us communicate with our home appliances like alarm systems, lights, sound systems and even kitchen appliances. We will also experience a massive growth of voice-controlled devices to rule our workplaces. Hands-free mobility will play a key role in hospitals, laboratories and manufacturing units. Additionally, we will have intelligence voice-driven cars, entertainment and location-based searches and the passengers can be completely hands-free.”
Tyler Schulze, VP of Strategy & Development, Cognitive Engines at Veritone, Inc
“It has been postulated that each human on earth has a truly unique voiceprint, typically identified via spectrogram analysis. In less than ten years, machines will have the ability to identify virtually any human being worldwide by their spoken voice, in real time. Public and private entities already correlate voice recordings and personal profiles programmatically from existing stores of data such as customer service calls, voicemail recordings, and personal videos distributed via social media.
Likewise, algorithms will soon be able to reproduce an individual’s voice with near perfect accuracy. Anytime your voice is recorded, you are contributing to a corpus of correlated data to be used as your personal profile, knowingly or unknowingly. Advancements in this field will continue to beg questions about security and anonymity. We also expect the methods for analyzing voice prints to evolve in lockstep with recognition technologies with ever-greater accuracy and differentiation requirements.”
Brian Roemmele, Founder & Editor-in-chief of Multiplex Magazine
“All prior computer interaction systems have one central point in common. They force humans to be more like the computer by forcing the operator to think through arcane commands and procedures. We take it for granted and forget the ground rules we all had to learn and continue to learn to use our computers and devices. I equate this to learning any arcane language that requires new vocabularies as new operating systems are released and new or updated applications are available.
What if we didn’t need to learn arcane commands? What if you could use the most effective and powerful communication tool ever invented? This tool evolved over millions of years and allows you to express complex ideas in very compact and data dense ways yet can be nuanced to the width of a hair. What is this tool? It is our voice.
We are at the precipice of something grand and historic. Each improvement in the way we interact with computers brought about long term effects nearly impossible to calculate. Each improvement of computer interaction lowered the bar for access to a larger group. Each improvement in the way we interact with computers stripped away the priesthoods, from the 1960s computer scientists on through to today’s data science engineers. Each improvement democratized access to vast storehouses of information and potentially knowledge.
The last 60 years of computing humans were adapting to the computer. The next 60 years the computer will adapt to us. It will be our voices that will lead the way; it will be a revolution and it will change everything.”
Peter T. Boyd, President & Founder of PaperStreet
“Star Trek style full conversations where the computer interacts with the user and can control technology. Everyone can have a personal assistant to get routine tasks done, items purchased, calculations made, and control devices in your office, home and car. With that will be security concerns in being able to mimic voices and break-into various devices, but they can probably be overcome with additional security in other ways (pass codes or other body signatures).”