More Resources

Speech Rec: the new leader of automated voice?

Customer Interaction Solutions • August, 2008 • CALL CENTER Technology

Speech recognition applications are not such revolutionizing IVR, defined for this article as enabling dual-tone multiplex frequency (DTMF) or TouchTone[TM] interactions, but instead appears to be supplanting it as the key means of automated voice interaction.

[ILLUSTRATION OMITTED]

That victory, which may be in sight, sets the stage for integrating voice with web, e-mail, and SMS to provide a unified user-friendly automated solution that will reduce agent engagement time and, for an increasing number but far from all interactions, eliminate agent involvement.

The value proposition of speech rec is that offers superior usability compared with DTMF for much lower cost than live agent: typically 50 cents compared with $5-$9 per transaction.

Implemented right, speech rec, along with the text and web applications can cut costs and maintain if not increase customer service satisfaction and retention.

Yet while speech rec technology has come a long way, it still places significantly lower than live agents in customer satisfaction surveys, though higher than DTMF.

"Speech rec has a customer satisfaction ranking of about 4.5 on a 10 point scale while DTMF IVR is typically between 1 and 2, points out Bob Lyons, General Manager and Vice President, Avaya's customer service business. "In contrast, live agent interactions score a 7 on average. The real opportunity is in finding a way to get speech interactions to begin approaching scores seen by live interactions."

To achieve that goal will, however, require resources. Speech rec software and integration can costs upward of several hundred thousand dollars, can take nine to 12 months to implement followed by one year of operation before achieving return on investment (ROI) multiple sources of rich content such as web, voice and user-specific data. These sources are typically in application silos, which will require large investments to integrate them so that they can present the data in the appropriate context, at the right time.

"The question becomes can the technology reach a level where customer satisfaction is high enough to offset the investments," says Lyons.

Speech technology developments

Speech recognition technology is slowly but inexorably moving in that direction. Aaron Fisher, Director, Professional Services, West Interactive has seen marked improvements in the overall performance of speech recognition software.

Applications are now better able to recognize callers with accents. The speech engines are more effective at screening out ambient or surrounding noise that is not generated by the callers' voices. These developments have led to increased automation rates and fewer agent opt-outs.

"In the old days, like 2003 and earlier, if a dog barked any time during a call, the speech rec application would think this noise might have been from the caller but wouldn't be able to make sense of it, "recounts Fisher. "Now if you have a loud dog or child, the system has the ability to analyze the difference between spoken noise and ambient noise and callers can achieve their tasks with higher success rates."

To illustrate, Loquendo's Loquendo ASR 7.5 features a new noise compensation feature plus it has re-trained all supporting languages with additional material recorded in the presence of background noise, including mobile. It also offers more complex and support for multilingual grammars and large vocabularies. It has differentiated timeouts to permit utterances of fixed format and length such as credit card numbers.

There is a continuing shift by users toward natural language speech rec, which enables callers to speak to the computers like they are conversing with people, away from directed dialogue speech rec, where callers speak one or two words in response to a DTMF-like menu.

Natural language permits callers to obtain what they want quicker and more easily. They can, for example, barge into the applications and have their requests understood because the speech engines parses through their words and retrieves the right responses from their libraries. This functionality leads to greater automated interaction completion rates and fewer live agent zero-outs. Yet the solutions are more expensive and complex to install.

"Natural language is preferable because it more closely aligns with the users need to have the system to respond to them," explains Lyons. The challenge is that natural language is not mature enough yet to deal with the general public. You have to build extensive libraries focused on the things that a person might ask. When you think about the many language options along with the many accents and slang options, it is easy to see why natural language is rather difficult to implement successfully in many situations."

There are many applications where directed dialogue is extremely useful. Voxeo, a provider of premise and hosted IVR and VoIP applications, points to the example of a mail order firm where 20 percent of callers dial in to find out about their order status.

The marketer uses Voxeo's Prophecy platform to ask the callers to say their order numbers, which is less restrictive than having them use DTMF. It then queries an existing Web-based order status solution and receives XML instructions to inform the customers that their orders have been shipped. Rather then ending the calls, the platform then queries the shipper's package tracking Web applications and tells the callers where the packages are.

Speech rec engines, especially those that use natural language will benefit from increased chip processing power driven principally by strong demand for increasingly sophisticated computer games, reports Ian Jacobs, senior analyst, Frost & Sullivan.

"The faster and more affordable chipsets will enable speech rec applications to route calls quicker and handle more complex interactions," he explains.

Advantages of speech over DTMF

These improvements are making speech rec a more effective automated voice solution compared with DTMF-enabled IVR for most if not all interactions.

Speech rec can bolster customers' experience with automated voice methods by enabling them to complete transactions or obtain information and assistance quicker by accommodating their requests, instead of forcing them to go through long hierarchical menus as with DTMF.

Speech rec also enhances CRM by permitting customer personalization. When the system recognizes the callers it can then, based on the rules you create, address them by the first names, cut through the menus, and present customized information and offers.

One literal driver to speech rec from DTMF is mobile commerce. Andrea Holko, Senior Vice President of Global Consulting Services, Intervoice cites the growing number of jurisdictions that have hands-free cellphone laws.

"In environments where for safety reasons you cannot use your hands to use a phone, like driving a car, speech rec is a necessity," says Holko.

Also, the conversational flow in natural language speech recognition keeps older customers in the automated applications longer before contacting live agents.

Security is enhanced with speech recognition because it allows for complicated and less-readily-faked passwords. These are migrating from the common mother's maiden name to names of high schools attended and to the names of first pets.

There are places for DTMF. It can provide a high degree of accuracy for low level security, such as through the entry of 4-digit or 6-digit PINs. It also permits customers to enter confidential information in public places, to avoid it being overheard, and possibly stolen by others. It can, in addition, process vast number of simple calls requiring only numeric inputs highly reliably at low cost.

If you do retain DTMF, avoid upgrading the host IVR with speech rec, recommends Avaya's Lyons. Instead, have the speech applications integtated directly on the routing and switching solutions. Have the IVR connected only for those customers who wish to use the DTMF functionality.

The Avaya executive explains that the IVR's hierarchical call flow conflict with the natural conversation flows in speech rec and live agent. Firms that install speech rec on the IVR therefore risk failing to achieve ROI, such as improved customer retention and satisfaction and shorter live agent call lengths, because more callers will zero-out than projected.

"Installing speech rec on the IVR is the worst solution for your customers because it doesn't allow you to change the paradigm and permit you to create customer-friendly call flows," Lyons points out. "All what you will have is more expensive DTMF with poot satisfaction or call containment rates."

Speech rec as live agent adjunct or replacement

Banishing IVR to the periphery leaves speech recognition open to take on live agents. Already it is handling more transaction types that ate edge of competence with DTMF IVR but which are too expensive for live agents, such as ordering movies, products, and tickets.

Speech rec can also reduce call lengths and call handling costs by obtaining basic routine information from callers that it then transmits to live agents. As speech applications become more robust they will be able to gather more data and handle more tasks, leaving less work for live agents to carry out.


1  2  
COPYRIGHT 2008 Technology Marketing Corporation Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2008 Gale, Cengage Learning. All rights reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.


Browse by Journal Name:
Today on Entrepreneur

e-Business & Technology
Franchise News
Business Book Sampler
Starting a Business
Sales & Marketing
Growing a Business
E-mail*:
Zip Code*: