Tech Blog

Jay's Technical blog

WP8 Speech Recognizers

15 April 2013
Jay Kimble

[WARNING! This is an archived post and as such there may be things broken/missing here.. you have been warned.]

If you don’t know I have been deep into WP8’s Speech SDK for the last 6 months. So much so that I am starting to help others along the way. I just got a question about how to get the confidence rating on a recognized word.

If you don’t know what that means, let me explain. Basically the speech recognition is not an exact science. There are complicated algorithms that analyze your speech to determine what words were spoken. One of the cool things with WP8’s SDK is that you can provide a list of words or phases you are looking for and this makes things a little more accurate (mainly because you are limiting the number of combinations to look for).

WP8 has 2 objects that you can use to recognize speech: SpeechRecognizer and SpeechRecognizerUI.

SpeechRecognizer gives you something a little more low level. It won't show the “pretty” UI that is shown while listening to the user speak; it won’t play a beep sound to indicate that speech is being recorded for recognition, nor does it give the user any feedback whatsoever. It does let you supply a list of words or phrases you are looking for as well as will do the big check for listening for any word. The SpeechRecognizerResult will give you the word (or words) it thinks the user said. It also will give you a list of alternates (via the GetAlternates() method) that might be what the user said. It also gives you a number to tell you how accurate it estimates that it was (and gives this rating for each of the alternates as well). It will also let you do your own recording and will process the speech from your recording (but it accepts a much smaller clip than what the actual MS mechanisms do). Actually doing a recognize for any word will allow for a shorter amount of data.

SpeechRecognizerUI does a lot of work for you. It puts up the UI, plays the beep to inidicate to the user to speek, handles the case where a word isn't quite recognized, and generally let’s the user know what is going on throughout the process. It returns a SpeechRecognizerUIResult object which contains SpeechRecognizerResult (in the RecognizerResult property), but the confidence is usually high (I have yet to see any alternates come through). I think this is mainly because the UI object does the extra work of clarifying with the user when it estimates that the accuracy of the recognition isn’t quite that high.

From my work, I have found that I tend to use SpeechRecognizerUI more often the the lower level mechanism. Mainly because the added UI/indicators creates a very good experience for the user.

[I find it difficult to not attribute human qualities to the recognizer. It was tough not to use words like “guess” during this post.. the recognizer is really amazing how accurate it is and when it is wrong, it’s usually pretty understandable why.]


Test post.. nothing to see here yet

18 January 2013
Jay Kimble

[WARNING! This is an archived post and as such there may be things broken/missing here.. you have been warned.]

Just putting a test post out..


Logo For Windows8 in JavaScript: Lessons Learned

29 November 2012
Jay Kimble

[WARNING! This is an archived post and as such there may be things broken/missing here.. you have been warned.]

So Kevin Wolf created a community challenge with a ARM Surface Tablet as the prize. He presented this to the local Florida Windows8 Development Community which I am a part of.

Here’s the basics of the challenge:

  • The Windows 8 App must be 100% JavaScript/HTML/CSS; no other client technologies allowed.
  • It had to be in the Windows Store by Dec. 1st (we had approximately 2 months to do it)
  • He wanted the app to be non-trivial so therefore there was criteria for what was “non-trivial”:
    • App must have a market base that was not the development community (and could not be a sample app). In other words it had to have an audience, a real audience.
    • App must implement some form of capture (first name, last name, etc)
    • App must connect to a web service of some type and pull and push data to it (both reads and writes); social networking sites were disallowed, but any other public or private web service was acceptable
    • Needed to implement either search or the share contracts
    • Must use either:
      • the camera in a meaningful way
      • Location services AND Bing maps in a meaningful way
    • Must implement 1 animation

That was it. When I saw the list I was thought, “Boy, he rigged the contest.” I could do that, but I’ve done this kind of insane project in the past. I considered doing it and forgot about it for a month.

A Funny Thing Happened on the way to a presentation

So  along the way I had agreed to do a presentation for the Great American Teach in at my kids’ school. I decided that I would show them some beginning programming with the Logo programming environment (aka Turtle Graphics). I decided since I would be using my Samsung Slate to do this that I should probably go ahead and find a Windows 8 implementation of logo. I couldn’t find one. After a bit of searching around I found a nifty one in JavaScript using the HTML5 Canvas. I decided to take this single page environment and make a full Windows 8 Store app with it. It was all rather trivial. I think the initial version took me 2 evenings to get working (so a couple of hours). I did have to change things. The version I had was using jqConsole which seemed to be overkill, but it provided one nice thing: history. So I fired up TyopeScript and created a nice little history component for my jqConsole replacement (essentially a Textbox with a sprinkling of JavaScript code to sniff for the enter key). I also added a few buttons to make it convenient to retrieve past history items and a scrollable region containing that history.

My presentation went off wonderful, and now I was thinking about finishing the app and getting it into the store (really as a convenience for others). It was at this point that I remember the challenge and realized that I had done the first couple items. Next up was the whole Share contract. Essentially what I did was make it possible to share out your history. I also implemented (I think) the opposite mechanism where you can share text back to the Logo app where each line will be executed by the Logo interpreter.

For the animation I created a simple movement of my turtle in CSS; he starts at the bottom of the page and moves to the center where he belongs.

Next I implemented hitting my personal App Analytics server for Error reporting (mainly). It stays off in most cases unless I am having issues with something then I turn it on for an app. Anyway, I only implemented the read component of this.

At this point I submitted to the Windows8 Store. I failed for not having a privacy policy (you have to have one when you need Internet access). That was harder than it should have been since I needed to learn a few things about doing this in HTML5 apps (which, BTW, is easier than what it takes in XAML.. HTML5 apps have a few more controls that we should have in XAML, but I digress). I got in…

I announced that I had won the challenge, only to read closer and realized that I needed GPS and WebService pushes. I decided to implement a non-standard GLAT/GLONG expressions in the Logo interpreter as a nicety for teaching students how to write a “how close am I too the equator (or other landmarks)” function. This didn’t take that long.

I next implemented a simple write to my Web service and got it working (this was harder). For both of the these I wrote TypeScript. I’ve learned enough about JavaScript that if I am writing something non-trivial I want a tool to help me. There are way too many gotchas with JavaScript that it just makes sense to me (Mind you, I have maintained systems that have tens of thousands of lines of JavaScript code and even worked on a system with 100k of JavaScript! By no means am I a rookie). I got it working and got rejected on Tuesday do to something not working.. not sure what that would be…

BUT I realized that the requirement was to write something with GPS AND Maps.. Not just GPS. If I had chosen camera (and I could have come up with something for that, but reasons also matter, because of the words “meaningful way”), I would have more or less been there. I have resubmitted but even if this version gets in before Dec. 1, I still didn’t complete the challenge…

Surprises

I ran into a couple of surprises. In some respects the set of controls that MS has enabled with WinJS/Html is a little more rich, and pretty easy to setup. The biggest item was the flyout when I built a settings panel for my GPS Enabler. The code to write this was mostly HTML and magic CSS Classes (I say magic because the MS Environment just knows what to do with them to create the control). This actually had me slightly enamored..

Takeaways

First of all if you have talked to me in the last week or so you may have gotten a different impression. I mentioned some positives to this. There are definitely some positives. BUT, I also need to factor in me. I was a trainer during the Web 2.0 craze. I knew my stuff. I once worked on/maintained/added features to a web project that had close to 100k lines of JavaScript code. I want you to ponder that for a moment…. you back? Good. Now realize that I can do non-trivial things with JavaScript and HTML (things you might not be able to do). I’m not as knowledgeable with the latest and greatest on the HTML5 track, but by no means am I an newbie with this.

Also, the project I took was something that already worked well with IE10, and I simply sucked it into the Windows8 Templates. If that was the end of the story then I would say everyone should be doing this, but it’s not. While my project was non-trivial, the task seemed to be trivial. Creating the share contract for instance involved me tweaking the JS code to expose things I could share. I imagine this is the case with Search as well.. you will have a fair bit of tweaking to make things happen.

Lessons Learned

So the positives, Microsoft has thought through this and what they have delivered is a nice environment. You can do some stuff with this. They have enabled a pretty easy mechanism for you to access the various controls via CSS Styles which makes things pretty easy on that front. If you are doing something fairly trivial (port a single page over that already works in IE10, or single page that sucks in rss feeds) then have at it. You can do it with HTML5/WinJS.

If you are doing anything more complex, I would suggest skipping HTML5, and putting on your “big boy” pants and building your app with XAML, VS2012, Blend (if you need it), and a copy of John Papa’s Data-Driven Services with Silverlight 2 (which needs to be renamed “Data-Driven Services with XAML” because it still applies today). With all that in hand you can do XAML. It’s not that hard. AND, by “complex” I mean web services in any form (where you don’t already have a fully tested/stable JavaScript library to access that service), Maps, Fancy Animations, File Apis, anything data driven…

Another lesson learned, if you have to write JavaScript, write TypeScript instead. You still have to remember that you are using closures from things like events, but it’s certainly a lot better having the design environment tell you when something is wrong due to typing, or misspelling, etc.

Html/WinJS has the same net effect in my mind as most of the JavaScript apps I have written in the past.. they are quirky. My Xaml apps feel much more solid. In my mind I will keep an eye on the technology, but I am going to stick with XAML for the near future… (you can’t do everything in JavaScript as some would like us to believe).


Windows Store Rejection#1: Lessons Learned about Age Ratings

10 October 2012
Jay Kimble

[WARNING! This is an archived post and as such there may be things broken/missing here.. you have been warned.]

Just thought I would fire up a quick post (and maybe a little advice to the powers that be at Microsoft).

I have been working on a Dropbox Client for the last several months in my spare time. I actually almost quit on the process and realized that what I had was more integrated with Windows 8 than anything else in the Windows Store, and that I really wanted to use my own app over anything else that is currently out there. Mind you I was about a week away when I decided to quit. When my work schedule broke a little I decided I NEEDED to finish this. So last Tuesday I finished it up and submitted to the Windows Store.

When I was filling out the submission, I got to the line that asks about Age Rating.. this seems weird to me, I hadn’t really thought about it. I mean I would expect it to be rated E (for Everyone) because you bring your own content. I could see a 3 year old being able to operate it.. so I set it to 3 and eliminated any country that wanted me to back that up (at least for now).

SIDENOTE: In other Mobile AppStores (you know the one who considers themselves the owner of the name “AppStore”), they don’t ASK you what you think your rating should be, they ask you a series of questions about your app and then they TELL you what your rating is.

So, A whole week later, I get the notification (this morning, actually) that I got rejected. I wasn’t offended. It happens there are many things in the guidelines and it’s really easy to miss something or to have a corner of your app that is not as well tested. This was my first time through so I expected this. Of course it was for the aforementioned Age Rating. I checked SkyDrive and saw it was Age Rated for 12+.  I assume from reading the materials it was because I have ads in an app and they don’t think anyone younger than 13 should see ads. Seems weird, but OK..

NOW, This is the rub. I made 2 clicks to change the Age Rating from 3+ to 12+ and hit resubmit.. No code has changed. Nothing has REALLY changed except that my age was set too low. Where am I in the process now? At the back of the line waiting another 7 days to hear if there was anything else.

[Warning Rant to proceed from here on]
I’m assuming that most of the tough stuff is out of the way, but I don’t really know. It does seem strange that I went through the whole process already and I have to wait 7 days. I mean I already passed presumably a number of issues. It seems like a waste of EVERYONE;’s time for me to have to wait and to have EVERYTHING thus far tested for a ZERO code change (no new binaries have been uploaded). This process needs to be streamlined some more that is for sure. Especially considering they want to have 100,000 apps in the Windows Store by January. If they don’t change the process some then that goal will never happen. Just my 2 cents.


Why MetroOAuth? or C# Async makes life easier

18 July 2012
Jay Kimble

[WARNING! This is an archived post and as such there may be things broken/missing here.. you have been warned.]

Yesterday, I finally pushed some of the oAuth libraries I have been tinkering with. I have a really nice and simple library (and this simplicity seems to flow throughout the library). I decided to pull out some examples using my DropboxClient to show how much simpler a library can be.

Here’s how you log in to Dropbox (in c#):

// if the user has already auth'd you can save this in ISO Store or localstorage
var accessKey = ""; // in our case we are setting it to empty (but it's not required)
var client = new DropboxClient(kDropBoxAppKey, kDropBoxAppSecret, accessKey);
var authd = false;
if (client.IsAuthenticated) // do we already have an access key
  authd = true;
else
  authd = await App.DropboxApi.Authorize("http://www.tbwindev.org"); // URI to go to once logged in
if(authd)
{
// Do something
}

We put everything into that example just so you can see the full set up. You don’t need the accessKey for instance and can simply leave it off that call (it’s optional). Like the comment says, after the user has Authorized your app you can grab that value from the client (stored in a property) and save it off, and then the user will not need to re-authenticate.

The “if” statement where we check if the user is already authenticated is really a convenience method, but you don’t want to re-authorize the user.

The only lines you really need are the creation of the DropboxClient and the line to Authorize.

Now what is behind that Authorize call is what makes this amazing to me. There are 2 http calls that happen with a call to the WebAuthenticationBroker to show the user a web browser window that is on the Dropbox site. All of these things happen asynchronously, but they happen in order. So, there is a call to get a temporary key which we use with the URI that we push the user to the Dropbox site’s login. After they have logged in (or closed the windows), we check to see if we got a code from the call. If so we make a call to get our official accessKey, and we can proceed from there. Again, all of this happens asynchronously, and your UI will be responsive during this time.

Async makes it so we don’t have to set up any kind of response action or event, etc. Each of the calls returns a Task or a Task<T>. We simply tell the compiler that we want to wait for the task that was passed back to us to complete before we proceed. Inside of the task that we get back there might be other tasks that get awaited. This makes it possible to have a method have a single responsibility without having to clutter it up with what happens or a link to what will happen on return.

 

AND, all this works in Metro and WP7.x (as long as you install the Async CTP)

Anyway, I’ll blog some more on this soon.

BTW, my project is at http://metrooauth.codeplex.com and the current (rather cluttered) code repo is at https://bitbucket.org/DevTheo/metrodropbox