Bill Gates
Chairman and Chief Software Architect, Microsoft — March 24
Kai-Fu Lee
Corporate VP of Natural Interactive Services Division, Microsoft — March 24
Chris Anderson
Windows Client Platform Team, Microsoft —
March 25
Pat Helland
.NET Architecture Team, Microsoft — March 25
Bill Baker
General Manager, Business Intelligence, Microsoft SQL Server Business Group — March 26




Refer a Friend

San Francisco Weather
Avg March highs: 63°
Current conditions



 Sponsorship Opportunities
Exhibit at VSLive!

 Keynote Speakers
 Conference Speakers
 Speaker Interviews
 Speaker FAQ
 Free Book Chapter Download

Speech Ready Now
Speaker Interview — Kai-Fu Lee

Since Microsoft is about to start a big push for speech technology, we asked Microsoft Corporate Vice President Kai-Fu Lee why Visual Studio developers should add speech to their tool chest, and whether speech is ready for prime time. Mr. Lee will follow Bill Gates onstage for the opening address to VSLive! San Francisco, March 23-27.

How big a role will speech technology have in IT applications in the near term, and what makes you confident of that?

I see enormous potential for speech technology to add value to IT applications in the near term. Speech and IT are a natural fit, because when you’re having IT trouble, often Web-based self-service doesn’t work, and speech interaction on the telephone is the only alternative. One example is password reset. More than 30 percent of the support incidents and time spent by the IT help desk involves users requesting that their password be reset because it has expired, been lost or forgotten.

A speech application that enables users to simply call on the phone and, via the automated speech system, reset their password, can result in tremendous cost and productivity savings for the IT department, as well as provide a positive (and positively satisfying) experience for the end-user.

I’m confident of the impact of speech on IT applications because speech technology has sufficiently advanced to the point where it is robust and accurate enough for mainstream applications. Another reason is that speech telephony applications and Web applications are converging — meaning that an IT shop can run one integrated, unified application for both speech and Web applications and deploy this on its existing Web infrastructure. This fits nicely into the IT mandate to cost-effectively leverage existing IT assets and extend them with new investments, such as a speech, to gain new functionality without rip-and-replacing.

What advances have occurred to make speech ready for prime time?

First, there have been tremendous technological advances in the areas of speech recognition and synthesis, statistical modeling, and noise robustness. Every year, the speech-recognition error rate is reduced by 10-15 percent. At this rate, for close-talking dictation, machines will reach human performance in about seven to eight years.

Another significant advance is the emergence of open platforms and standards. Customers have demanded that speech technologies and the related interactive voice response (IVR) platforms move away from proprietary, incompatible standards and adopt open W3C specifications and standards. One example of an open standard driving speech to primetime is SALT (Speech Application Language Tags). Open standards will help drive down cost and improve interoperability and re-usable software.

This will help bring speech to the mainstream. In addition, the release of Microsoft Speech Server 2004 will provide a common and standard platform for application developers to coalesce around as opposed to the proprietary, fragmented and somewhat confusing platform market that exists today. Microsoft Speech Server will enable the creation of packaged speech applications that can be developed and delivered by the mainstream community using Visual Studio .NET.

I previously mentioned the convergence of telephony and Web technologies, which is another driver that is advancing the market and making speech ready for prime time. Finally, the cost of speech solutions is dropping to the point where not only large enterprises will find speech affordable, but medium-size businesses will find speech a cost-effective technology to implement. Microsoft Speech Server is leading the way in enabling flexible and integrated speech solutions at the lowest total cost of ownership.

How big a leap is it for a skilled Visual Studio developer to add speech to his or her résumé?

It’s not a big leap at all, but as with any new technology there is a learning curve. However, we’ve provided the tools that make speech development for Visual Studio developers faster and easier than ever before. Our Microsoft Speech Application Software Development Kit (SDK) integrates into Visual Studio .NET and, with the controls and tools provided, enables programmers to add speech into their Web applications using the standard programming paradigm they use for any other Web application.So if the developers know Web programming, uses objects and events, and a little scripting, they can easily use our tools for speech-enabled Web application development.

Speech adds another level of user interface design, however, sometimes called VUI (voice user interface). So just as developers learned GUI (graphical user interface) best practices over time, they will need to learn VUI best practices as well. We’ve made that job easier for Visual Studio .NET developers through both our pre-built asp.net controls included in the SDK, which encapsulate the VUI design within the control. In addition we have made available instructor-led training courses that provide the necessary VUI design knowledge for the developer.

A company called Vertigo Software is a great example of a Visual Studio .NET development shop that didn’t know anything about speech, yet within a few weeks was able to use our tools to build and deploy some excellent reference speech applications. In the January/February issue of Speech Technology Magazine, Vertigo details its experience using the Microsoft Speech Application SDK to build speech-enabled Web applications.

Where do you expect speech to have the biggest impact — the data center, mobile devices, desktop clients — and why?

Speech technologies are currently gaining tremendous adoption in the data center, primarily for customer self-service applications in telephony. The use of speech in the data center is driven primarily by telephones and cell phones. The data center approach allows a scalable, manageable and reliable server-based deployment for speech applications capable of supporting hundreds or thousand of simultaneous speech-enabled phone calls. So in the short term, the data center will be the arena in which speech has the biggest impact. In the medium to long term, mobile screen-based devices such as Pocket PCs and Smartphones will accrue the value of speech technology. We call this multimodal, or the mixture of speech and visual input/output.

Imagine calling your financial company and, speaking into your speech-enabled mobile device, asking for an update on your stock portfolio, or calling your travel company via its automated system for a listing of upcoming airline flights from Boston to New York. Your speech request will be answered with a GUI-based display of your stocks or the listing of flights. We believe the proliferation of mobile devices, and the limitations they naturally have for easy input, also will benefit from speech technology.

The desktop currently provides much easier input modalities, such as the keyboard and mouse, so the use of speech on the desktop is somewhat more limited. However, we see value in speech on the desktop via dictation capabilities and command and control of applications, particularly for people with disabilities and people who are slow typists. But as I mentioned earlier, with 10-15 percent improvement every year, speech on the desktop will be faster than typing in just a few years.

Finally, when human-level accuracy is possible, speech will be pervasive, and it will lead to a new kind of user interface: a delegation interface where you no longer tell the computer the steps to do something, but just the goal you want to accomplish —and the computer figures out the rest! That may be 10 years away, but it will completely revolutionize the human/machine interface and change the way we interact with every device.



 


 FTPOnline  |  © 2004 Fawcette Technical Publications
    Sign up for regular e-mail updates
  VSLive! 2004  |  FTPOnline  |  Contact Us