First things first, here’s the demo http://ibex.clearfly.net:8080/recognition/Recognition.swf now, on with the blog….
Java and Flex’s capabilities are the natural fit for the next generation of the web (I dare not say Web 3.0) where face recognition sign ons, video chatting, and speech enabled navigation reign king. Hmmmmm you might say, or maybe your thinking about a computer named HAL 9000 from the Space Odyssey saga (fun little fact for my readers, the letter after H is I the letter after A is B and the letter after L is M; IBM). Well I know my imagination was running wild when I finally put my speech activated navigation system together.
How it all began was about a month ago I was asked to see the viability of using a speech recognizer on the web, I was asked to give consideration to a system that could help people learn how to read online. With this I set off on a journey to see what was out there in the open source world to help me along in my endeavors.
The first stop on the road to finding all the pieces to my online speech recognition system was OBVIOUSLY a speech recognition piece of software. I ended up taking about a week to find the speech recognizer I wanted to use. During this week I researched a number of speech recognition items and researched speech recognition in general as well. Speech recognition is a very complex matter with lots of concepts and vocabulary that “speech experts” use when discussing the topic. I felt it was of critical matter to understand terms like Hidden Markov Models (HMM), Utterances, Speech Models, Trained Speech Models, Acoustic ranges, Terrace searching, and number of other related concepts. During this week I evaluated a number of speech recognition tools including Sphinx, Nuance, VoiceBox, and the Microsoft speech server. In reviewing these I was looking firstly at finding the “best of breed” speech recognizer the one(s) most commonly used. In doing this I found that Nuance had the most widely used speech recognizer, however it only had a .NET API and the product is commercial as well, I’m a Java guy and on a tight budget. In reviewing the other open source (targeting java of course) products I found that most of them simply incorporated “CMU Sphinx” as their speech recognizer. So this began the process of evaluating and understanding Sphinx. Sphinx is quite nice and I think you’d be very impressed with what it gives you right out of the box. So there I have it, the first piece of my online speech recognition system. Next stop, feed the recognition system from a web browsers hmmmmmm.
So now the question became, how do I get the users voices into the speech recognizer from the web. Clearly, I’m going to need a browser plugin, html and javascript aren’t going to give me microphone support. Ok, ok, but which technology to use? Applets? nahhhhh nobody has the plugin anymore, sorry sun it was a good try but Flash is King when it comes to browser support and user adoption in the rich internet application RIA space, after all what site doesn’t use Flash to show advertisements. So there I go, Flash it is, but to be honest I’m not that great when it comes to building things in Flash. Yeah, I’ve made some fun cartoons with the kids and made a couple banners but, this was a system an online application, I’m gonna need common html like features! Have no fear Chris, your surrounded by Flex developers :) one of the perks of my job. I really hadn’t played around with Flex before this project, I’d listened to a presentation on it by Leif Wells and I had done some reading on it, but the opportunity hadn’t risen to put it to use. Flex is what OpenLaszlo tried to be :) that little note is for you old guys like me. However, now I’m a Flex junkie, I love it! The only thing faster than developing in Flex is downloading an existing open source project and putting it on a server. Something I really appreciate from the Adobe guys is their HUGE embrace of the open source community and for establishing a project called BlazeDS. This is a Java integration point to Flex and if your a Spring Framework guy like me then I’m telling you right now, forget about Spring MVC, JSF, Seam, Struts, or any of your other frameworks and learn Flex with BlazeDS. It’s simply too easy to deploy scalable tiered applications. You only need to write your Service and DAO layers and expose them to Flex via BlazeDS and you now have a remoting client calling your service layer, that’s how it’s supposed to be right. I’m on my way now right! Flash has microphone support, now I just need to get up to the server have a service grab this speech and pipe it through Sphinx and see what the users are saying right?!?!
Hold the press!!!
BlazeDS doesn’t support the Real Time Messaging Protocol RTMP (it’s ok I didn’t know what it was either till I needed to use it). RTMP is a how Flash “publishes” streaming video and audio to the server and since we’re using BlazeDS as our remoting end point he was the guy I was looking to, to handle this. Here’s something to keep in mind, the RTMP protocol is now open source THANKS ADOBE! but, BlazeDS doesn’t support it yet! On a side note, it used to be you had to buy the Flash Media Server to interact with video and audio media types but, not anymore there’s a very well known open source media server called Red5. Thank goodness, cause the Flash Media Server has a nice price tag (nice if your Adobe that is). So, now I’ve really got it, I have the plan of attack and goes like this.
Step 1:  Publish “utterances” to the Red5 server.
Step 2: Tell our Spring service layer to process this (right now we take the *.flv that’s published and pipe it through ffmpeg to output .wav for Sphinx).
Step 3:  Send the response back to the client.
Step 4:  Let the client take actions based on what it understood (ie. open a tab or something)
Got it?
Easy enough!
Wanna see it?
Here you go: http://ibex.clearfly.net:8080/recognition/Recognition.swf (once your on the site, there’s some videos to tell you how to use it)
Enjoy the demo and let me know what you think!
 
Hi Chris,
ReplyDeleteWould love to see your demo but the link provided isn't working. Please post a new one.
Thanks,
Fred Weigman
Vancouver,BC,Canada
Hey Chris,
ReplyDeleteNice information! Guess the link is not working.. can you please post the link again?
Cheers!
hello Chris.
ReplyDeletevery informative stuff u have provided, but the link that u have provided is not working :( ....