Zoom’s earnings may have exceeded all expectations in 2020 as millions of people turned to video conferencing during the pandemic. But Zoom isn’t the only game in town, and one day meetings won’t be entirely virtual.
So Microsoft is showing off a new Intelligent Speaker system designed for use in Microsoft Teams hybrid meetings where some people are in the room.
Not only can these speakers pick up multiple voices in a conference room, but it uses artificial intelligence to create transcripts in real-time. It can even differentiate up to 10 different voices and add participant’s names to the transcript so you know who said what.
Microsoft says it’ll make its intelligent speakers available in private preview later in 2021. The speaker has a 7-microphone array for detecting voices and uses speech recognition to provide real-time captions for meetings and transcripts that can be viewed later.
While transcripts can help create a record of in-person meetings or virtual ones, some features are clearly aimed at folks following along in real-time via a computer screen. For example, there’s support for real-time captions, and The Verge reports the speakers also support translation, allowing remote participants to follow the meeting in their own language.
via Microsoft
It could cause whole team to be silent.
Oh you might hear a fart or two.
I think I’d like to review Microsoft’s privacy policies before I agreed to allow my employer to implement AI voice recognition in our meetings.
If theres a link between my name and some kind of “voice profile”, I’d like to be assured that can’t be shared with anyone. Especially by other parties that have API access to Microsoft accounts. I’d also like to see this feature requiring a user opt-in.
These kinds of things are starting to make me more vigilant about the things that my employer subjects us to, in terms of digital privacy. I’d really like to start seeing some laws passed in my country to provide more privacy rights to employees.
I wonder if this is a cloud service or software running on the local microphone. I remember when MS included Windows Speech Recognition in Windows Vista.
https://en.wikipedia.org/wiki/Windows_Speech_Recognition
The article says: The speaker has a 7-microphone array for detecting voices and uses speech recognition to provide real-time captions for meetings and transcripts that can be viewed later
So it sounds like at least part of this is being done in the speaker. At the very least, I would guess that the speaker has machine learning hardware to help identify differences in voices, and then maybe it passes that data to a cloud service that matches that data to existing voice profiles?
This could be a good way to use idle cloud resources. I would not be surprised if pricing is based on when the transcripts are delivered… one price for same-day and another cheaper price for next day. The microphone just does the signal conditioning and then zips the audio file for transport to MS Cloud.
It definitely presents the option for MS to offer different amounts of priority.
However, I don’t see this feature being very useful unless the data is available live during the meeting, or at the very least, immediately after the meeting is over.