April 07, 2007

Voice-to-text experiment

As I am trying out the voice to text feature of Windows, I am discovering enough interesting aspects/problems/features that I thought I'd share it.

My original intent was to use the dictation feature of Windows Speech Recognition with MSOffice OneNote and basically dictate a Blog as I encounter interesting and informative things. In speech recognition training, it explains that the recognition initially will start around 80-90% accuracy, and end up around 95% after completing all of the training sessions.

I've gotten through 5 of the 8 sessions at this point, and 95% is about right, but I've found that it's not just the sound of the words, but the context of the sentences that it uses to convert the audio to text. Here's an example of the recognition, with my spoken word(s) in quotes, followed by the actual recognition text:

'cup' top
'cup' The
'cup' Company
'cup' Come
'cup of coffee' cup of coffee

So it won't recognize it the word 'cup' by itself, but it recognizes the entire phrase 'cup of coffee'. I have even specifically recorded the word 'cup' in Speech Tools. It is substituting words that would more logically occur as the first word in a sentence.

The training also appears to be having an effect - when I said 'cup of coffee' above, I used my actual speaking voice that I used in training, so it was really like 'cuppa coffee'. If I use exaggerated enunciation, it doesn't work:

'cup of coffee' Couple of coffee

And even when it goes totally haywire, the gibberish does have somewhat of a logical sentence structure to it. So if this blog ever seems psychadelic, you'll know I'm trying it out again. Here is this same paragraph using unedited voice-to-text:

And even when it goes totally a long DJ version does have somewhat of a logical sentence structure to solve this ballgame ever seen psychedelic you'll know I'm trying it out and here's the same

Using unedited voice to tax

Funny it didn't recognize the phrase voice-to-text.

So anyway, I'm not sure how usable this is right now, but maybe it could be used to record notes to self reminders etc. I'll try more after finishing training.

