Digital assistants have taken off in the past few years, allowing you to speak to Siri on your iPhone, Alexa on your Echo speaker, Cortana on your PC, or Google Assistant on your Android phone. But all of those platforms have something in common: they use proprietary speech recognition techniques.

Amazon, Google, and Microsoft all offer tools that let developers incorporate their digital assistants in third-party speakers, phones or other devices. But the code is still closed-source.

Mozilla is taking a different approach: the organization behind the open source Firefox web browser has just released an open source speech recognition model.

Mozilla’s speech recognition system is based on research from Baidu’s Deep Speech project, and it was trained using a data set of almost 400,000 voice recordings from over 20,000 people.

In keeping with the whole open thing, that data is available for download if you want to build our own engine or just listen to 500 hours of speech.

Mozilla says its system offers a word error rate of about 6.5 percent, which means it’s not quite as good as a human being at recognizing speech. But it comes pretty close.

It’ll be interesting to see whether developers adopt Mozilla’s tools and what kind of applications they build with it… or if the fact that most phones, PCs, and smart speakers already offer voice recognition means that Mozilla’s system will primarily be of interest to a niche audience of open source enthusiasts.



Support Liliputing

Liliputing's primary sources of revenue are advertising and affiliate links (if you click the "Shop" button at the top of the page and buy something on Amazon, for example, we'll get a small commission).

But there are several ways you can support the site directly even if you're using an ad blocker and hate online shopping.

Contribute to our Patreon campaign

or...

Contribute via PayPal

4 replies on “Mozilla releases open source speech recognition tools”

  1. 6.5 percent error can be terrible as it would be misspellings, homonyms, and many other errors that have to be seen to.
    And…. how fast/slow is the recogontion??
    Early precursors required slow and even speech, and then the user was monitoriing it from the screen.
    Still, Youtube as a relatively good Closed Captioning service, so it must be getting better.
    But one may need a Beowulf machine to “never let it rest, until the good is better, and the better, best.”

    1. It’s definitely getting better, but the best speech recognition still relies on server-based data and models — i.e. Google Home, YouTube, Alexa, etc.

Comments are closed.