In addition to providing printed telephone directories, telephone companies provide telephone directory assistance services. Users of these services call predetermined telephone numbers and are connected to directory assistance operators. The operators access directory databases to locate the directory listings requested by the users, and release the telephone numbers of those listings to the users.
Because telephone companies handle billions of directory assistance calls per year, the associated labour costs are very significant. Consequently, telephone companies and telephone equipment manufacturers have devoted considerable effort to the development of systems which reduce the labour costs associated with providing directory assistance services.
In handling a typical directory assistance call, an operator may first ask a caller for the locality of the person or organization whom the caller wishes to call. If the locality named by the caller is one for which the operator has no directory listings, the operator may refer the caller to a different directory assistance telephone number which is associated with the requested locality. If the operator does have directory listings for the requested locality, the operator may ask the caller for the name of the person or organization whom the caller wishes to call. The operator searches a directory database for a listing corresponding to the requested person or organization and, upon finding an appropriate listing, releases the telephone number of that listing to the caller.
The labour cost associated with providing directory assistance services can be reduced by partially or totally automating functions previously performed by human operators. U.S. Pat. No. 4,979,206 discloses use of an automatic speech recognition system to automate directory assistance operator functions. Directory assistance callers are automatically prompted to spell out names of localities and people or organizations associated with desired listings. The automatic speech recognition system attempts to recognize letter names in the spoken responses of the callers and, from sequences of recognized letter names, recognizes names of desired localities, people or organizations. The system then automatically searches a directory database for the desired listings and, if appropriate listings are found, the system automatically releases the telephone numbers of the listings to the callers. The system may also complete the desired connections for the callers. If the system is unable to recognize spoken letter names or cannot find appropriate listings, callers are connected to human operators who handle the calls in the normal manner described above. (U.S. Pat. No. 4,979,206 issued Dec. 18, 1990 in the names of F. W. Padden et al, is entitled "Directory Assistance Systems", and is hereby incorporated by reference.)
The speech recognition system of the directory assistance system disclosed in U.S. Pat. No. 4,979,206 has a recognition vocabulary of less than 50 words (the names of twenty six letters, the names of ten digits, "yes" and "no"). The use of such a restricted recognition vocabulary simplifies design and training of the speech recognition system. However, the restricted recognition vocabulary makes the directory assistance system cumbersome and time-consuming for callers to use. Faced with the inconvenience of spelling out the requested information, some callers may refuse to use the automated directory assistance system, forcing the system to connect them to a human operator, and this erodes the labour cost savings that automation is intended to provide.
Lennig et al disclose an automated directory assistance system which is based on a speech recognition system having a recognition vocabulary large enough to contain the names of most localities and several organizations that are likely to be requested by callers to a given directory assistance location ("Automated Bilingual Directory Assistance Trial in Bell Canada", Proceedings of the IEEE Workshop on Interactive Voice Technology for Telecom Applications, October 1992, Piscataway, N.J.). This speech recognition system uses Flexible Vocabulary Recognition (FVR) techniques similar to those disclosed in "Flexible Vocabulary Recognition of Speech over the Telephone", Proceedings of the IEEE Workshop on Interactive Voice Technology for Telecom Applications, October 1992, Piscataway, N.J. and in "Unleashing the Potential of Human-to-Machine Communication", Telesis Number 97, 1993, pp. 22-33 to achieve the expanded recognition vocabulary. These publications are hereby incorporated by reference.
Because the speech recognition system disclosed by Lennig et al can recognize locality and organization names as spoken naturally by callers, there is no need for the callers to spell out these names to obtain desired telephone numbers. Callers are more likely to use directory assistance systems providing this level of convenience, so the saving in labour costs is likely to be higher.
However, to implement a directory assistance system as disclosed by Lennig et al in a real telephone network, the automatic speech recognition system must be "trained" to recognize to a high degree of accuracy all locality names and several organization names likely to be used by directory assistance callers. Such training requires recordings of a large number of local speakers saying the locality and organization names, and each recording (or "speech token") must be labelled as corresponding to a particular locality or organization name. Approximately 20,000 labelled speech tokens are required to train an automatic speech recognition system so that it provides adequate recognition performance for locality and organization names in directory assistance applications.
Typically, it takes several weeks of a skilled speech scientist's time to collect and label approximately 20,000 speech tokens. Even after training with this relatively large sample of speech tokens, the performance of the speech recognition system can be improved further by training with additional labelled speech tokens collected from local speakers.
Moreover, the speech patterns of regions served by directory assistance systems evolve over time, so that the performance of a speech recognition system which is initially well-trained to recognize locality names as spoken by local speakers may deteriorate over time if it is not periodically retrained to allow for changes in local speech patterns.
Consequently, training of speech recognition systems for use in directory assistance applications is a costly and time-consuming enterprise.