FTPOnline - VSLive! SF 2004 - Microsoft Speech Server 2004

FTPOnline

Channels

Conferences

Resources

Back to VSLive! San Francisco Show Daily Home

email article

printer friendly

Speech Server Advances
Microsoft's Kai-Fu Lee aims to bring speech technology to the mainstream.
by Ken McNamee

VSLive! San Francisco, March 25, 2004

Kai-Fu Lee
Corporate Vice President of Natural Interactive Services Division, Microsoft

If you've ever tried to speech-enable an application—any type of application—then you know it can be daunting… until now. At VSLive! San Francisco, Microsoft officially released Speech Server 2004, which allows any company to add (nearly) natural speech capabilities to its applications.

In his keynote at VSLive! on Wednesday, Kai-Fu Lee, corporate vice president of Microsoft's Natural Interactive Services division, stressed Microsoft's goal to bring speech technology to the mainstream through reduced costs, increased flexibility, and better integration. You can use Microsoft Speech Server to create new applications, but more importantly, you can easily integrate it into existing applications, even if the application was never designed with future speech enablement in mind. This is one of Speech Server's most impressive and compelling features.

What Exactly is Speech Server?
Microsoft Speech Server is difficult to describe in one sentence—it's more of a flexible collection of technologies that can be mixed and matched to meet the needs of a small company or a global enterprise. A small company might need only to allow customers to query a product catalog over the phone using speech. Speech Server can handle that. A global enterprise might have many speech-enabled applications with users in different roles or languages that require layers of security and business validation. Speech Server can also handle that. In fact, and this might seem odd at first, the simplest way to describe Speech Server is as just another way to render a presentation layer. The difference is that this presentation layer is for the users' ears instead of their eyes.

Speech Server functions similarly to ASP.NET. In fact, it requires IIS and ASP.NET in order to execute. Speech Server is almost like an add-in to the ASP.NET engine, even providing its own server controls. However, unlike the ASP.NET server controls that you are accustomed to, the Speech Server controls output verbal prompts and messages instead of standard HTML and JavaScript.

One of the most important technology advances that ASP.NET offers has been the ability to cleanly separate the presentation layer from the business layer using code-behind classes and a custom inheritance model. If you have taken full advantage of this ability, you will reap the greatest rewards if you decide to integrate Speech Server into your Web application.

The more separation you have between the presentation and business layers, the simpler the integration process will be. This is because you can merely replace the ASPX pages that render HTML with ASPX pages that contain speech tags and render their content verbally. Both types of ASPX pages can use the same middle tier business objects to perform operations such as security, validation, and database access. This is a huge leap forward in development productivity for speech applications and should significantly increase the adoption of this technology.

Why Do We Need Speech Server?
Microsoft Speech Server 2004 makes developing speech-enabled applications relatively simple, but it has not always been this way. Up until a year or so ago, if you had asked the developers on your team if they could add natural speech recognition and response to an existing application, their eyes would probably glaze over and you would soon find them scouring job-search sites on the Internet. Needless to say, the tools and APIs required to do the job were not standardized and had a difficult learning curve. That's primarily why so few applications offer speech capabilities. The cost was too high. The development and maintenance was overly complex, and integrating the feature into an existing application was difficult at best and practically impossible in many situations.

During Kai-Fu Lee's keynote, numerous Microsoft partners demonstrated how they are using Speech Server today, including applications for hotel reservations, insurance processes, and law enforcement across various systems and devices.

As Lee outlined, Microsoft has designed Speech Server to significantly reduce the cost, complexity, and integration barriers that were keeping many companies from pursuing speech functionality in their applications. While Speech Server is not free—$7,999 per processor for the Standard edition and $17,999 per processor for the Enterprise edition—the development tools and SDK are free of charge. In addition, third-party vendors are already creating helper tools and packaged components/solutions that work with Speech Server. Speech-enabling your applications might not be on your radar right now, but Microsoft Speech Server 2004 could change that.

About the Author
Ken McNamee is a senior software developer with Vertigo Software, a leading provider of software development and consulting services on the Microsoft platform. He previously led a team of developers in rearchitecting the Home Shopping Network's e-commerce site, HSN.com, to 100-percent ASP.NET with C#. Readers can contact him at kenm@vertigosoftware.com.