Give Your Users a Voice
.NET 3.0 introduces several new features that simplify utilizing speech in your applications. Taken as a whole, these features enable you to build far more robust apps with far fewer lines of code.
by by Jeff Certain
June 13, 2007
Technology Toolbox: VB .NET, XML, Windows XP or Windows Vista, Visual Studio 2005 with the Windows Presentation Foundation Libraries, A microphone and a sound card.
The potential of speech in applications is enormous—and almost completely untapped. One reason for this underutilization is that many developers' initial experiences with speech recognition were unsatisfying. As recently as a decade ago, the state-of-the-art in voice recognition technology was a dedicated ISA sound card that performed speech recognition. More recently, most people's experience with speech recognition has been limited to end-user technologies, such as automated telephone-menu systems and voice-to-text dictation packages. These products have a number of flaws from a developer's point of view. In general, they provide little—if any—support for developers; they require specialized and esoteric knowledge to perform the requisite phoneme mappings; they require significant training of both the user and the speech recognition engine; and they are expensive to deploy.
Contrast that with the speech API (SAPI 5.1) that shipped with Windows XP. (Download the SAPI 5.1 SDK athttp://tinyurl.com/yq3en). This version provides a solid speech-recognition engine that provided developers with flexibility (albeit poorly documented flexibility), accepts English words without any mappings, can be used with minimal or no training of the user or the engine, and—best of all from my project manager's viewpoint!—is free to deploy.
The speech API has improved significantly in its latest iteration. The current offering from Microsoft not only exposes more powerful toys, such as the ability to use a WAV file as the input to the engine, but it also gives developers coarse-grained functionality to perform common tasks. It's hard to criticize more powerful toys that get easier to use.
Unfortunately, this great technology is under-publicized. One of the least mentioned features of .NET 3.0 is the inclusion of the System.Speech namespace in WPF, which uses the speech APIs (SAPI) built into Windows XP (SAPI 5.1) and Windows Vista (SAPI 5.3) to provide speech recognition and text-to-speech (TTS) functionality (Table 1).
These recognition engines, wrapped by System.Speech, allow dictation, custom grammars, and rules-based recognition. You can use either a microphone or a WAV file as the input to the engine to simplify transcription. The dictation mode provides a dictionary of about sixty-thousand English words, and requires three lines of code to use. Rules-based recognition allows the creation of a flowchart of words or phrases, similar to the phone menus we've all grown to despise. Neither of these features is particularly compelling; as mentioned earlier, applications providing these features have been around for quite some time.
That said, the System.Speech namespace offers desktop application developers a tremendous amount of power. Specifically, the namespace enables you to generate custom grammars dynamically. You can use configuration files to generate sets of dynamic controls and the accompanying grammar; once you create a relation between the phrases in the grammar and the controls, you have a tremendously flexible interface that lets users activate your controls by voice. (Note that this functionality is distinct from Microsoft's server offerings. For Web-based and telephony applications, Microsoft has a separate offering: Speech Server. The latest version [Speech Server 2007] is currently in beta.)
Back to top