Interactive Voice Response (IVR) Telephone Systems

In the SERA project we failed to put speech recognition technology in people's homes. The siri marketing shows apple's speech interface working in a kitchen, but real people in a real kitchen? No way. [Wallis'10] It turns out there are issues with "far field" speech recognition and with the way we "attend to" noises around us. These issues are currently being looked at in the RooPi project but in the mean time, how do you structure a conversation with a machine?

Automated phone systems are big business with tens if not hundreds of companies in the UK willing to set up a system for you. Large companies have put massive effort into making IVR systems work and this means they are an excellent platform for R&D on structuring a conversation. The default way to do this in industry is using Voice XML.

The focus however is on the information contained in user utterances with advice on how to develop IVR systems broadly following the principles of good HCI design [eg Balentine]. This focus on information has been criticised by those in the human sciences for many years [eg Reddy, "The conduit metaphor" in Ortony, A. "Metaphor and Thought" 1993] but such criticism goes largely unheard by those in computational linguistics. It turns out language is used not only to transfer information, but also to manage social relations.

Machines with Social Skills

The why and how is a long story but the short answer is we have applied Michael Tomasello's principles of human communication to the classic IVR problem of telephoning the reception desk and asking to be put through to an individual. The problem with working on an IVR system is that the challenge is to make it behave normally. In general people do not like using them and when one makes them work better, people don't notice; they are just less annoyed. Don't expect to be able to use our test system - you wouldn't be impressed.

Instead we did an experiment in which we explicitly measured user satisfaction using a survey designed for more conventional user interfaces. IVR systems are often formally evaluated by looking at two factors: task completion rates and user satisfaction. We pegged task completion at 20% by giving our subjects primarily tasks that could not be completed. We then implemented an automated switchboard using Voice XML following guidelines based on the information state update model of language. We then implemented another system performing the same task (again in VoiceXML) which using the principle that as a social actor the machine needs to be intentional and cooperative as described in detail in my CASA 2013 paper, The Intentional Interface. The user satisfaction survey used was the UEQ which divides user satisfaction into six scales: Attractiveness, Perspicuity, Dependability, Efficiency, Novelty an Stimulation. Note the questionnaire was designed to measure user experience on conventional ICT interfaces and the reader is referred to the original paper for an explanation of these choices. The results are presented in Figure 1 and provide a nice contrast between the generally negative user satisfaction of the classic approach and the generally positive results for a system that plays by the rules of human society.

What next? The system used for the above tests was a demonstrator in that it shows how it would work, but it only works with the set of test scenarios. The next phase is to create a prototype using the same principles which works with a real database and callers with real needs. Please get in contact if you have an application we could trial.

Peter Wallis