Engineering Natural Language Interfaces: can CA help?

The people

The University of Sheffield, Department of Computer Science: Mark Hepple (PI) and Peter Wallis
Newcastle University, Linguistics and Language Sciences: Alan Firth and Christopher Jenks

The grant

EPSRC: Digital Economy: Feasibility Studies in novel ICT developments which allow early User adoption
Start date: 01 April, 2008, Duration: 12 months

Summary

Interactive Voice Response (IVR) is used extensively by business for routine customer support, but the usefulness of these systems could be improved if they were, well, less annoying. The hypothesis is that natural language interfaces fail primarily because they say the wrong thing and there are subtle but significant differences between ways of saying things that natural language interfaces must take into account. Current best practice for developing IVR systems is to use one's intuitions about language and to simply write down what should be said and when. Based on sucesses with statistical parsers, attempts have been made to use machine learning to decide what to say next from a corpus. This proposal is for a study to look at the feasibility of using conversation analysis, or CA, to provide a structured approach to the creation of IVR systems. Conversation analysis has been evolving since the early 60's and has become an accepted part of applied linguistics [2]. A popular introduction to CA is Hutchby and Woofitt [4] but it has also had a recent re-vamp and revival (see ten Have [12] and Seedhouse [11]). For the ideas of CA to contribute to current work in computational linguistics, it necessary for annotated corpora to be created which capture their insights as applied to sufficient quantities of real data that the resulting resources can be used to aid researchers who are developing dialog systems, and possibly to provide data for machine learning based systems. This project will assess the feasibility of creating such resources, particularly in terms of whether CA-based annotation schemes can be developed for which reasonable levels of inter-annotator agreement can be achieved, and for which adequate annotation throughput is possible. Natural language interfaces have been "near future" technology since the first days of computing and CA may be the methodology to take us that critical step closer.