Frequently asked questions (FAQs) sections exist in abundance on the web and provide a good source of information that, in speech applications, can be automatically played back to the user in real-time. Regarding the automatic playback of online content or web FAQs to users over a telephone-based speech application, this is an approach that the industry is just now beginning to explore. The current popular approach is simply to fetch the FAQ or online content and play it back to the user, “as-is,” without any real-time transformation of the information to suit the voice user interface in the speech interface.
However, it is misleading to assume a direct transfer of web data to the speech interface. It is due to the fact that the web and the telephone channels represent fundamentally different user interfaces resulting in several usability problems. For example, many FAQs contain graphical content and so, simply fetching the FAQs from the web encounters the problem of how to translate graphical content into text. Also, for the FAQs that only contain text, these are written in a textual or visual interface and as a result most instructions have multiple actions or conjunctions. For example, “hold down the action key and then page down” is written as one step but in reality it involves two actions (a) hold down the action key, and (b) page down. The user is able to easily parse multiple actions or conjunctions in the textual or visual interface. However, the same is not true when it is played back in a speech application because of the impact on user's short term memory. In addition, most of the FAQs contain lengthy list of instructions which make it difficult to playback on the telephone interface without excessively tasking the short term memory or which make it difficult to allow the user to be able to apply the instructions for actually resolving their problem. Furthermore, FAQs do not usually have specific or standard formats and this makes it difficult to devise a systematic way to playback FAQs across different clients or domains.
Given these problems, FAQs have not been successfully implemented for automatic playback to users in speech applications. A systematic automated process or algorithm is needed to transform the data from a tactile web-oriented interface to the suitable auditory or multimodal interface for providing self service to users via speech applications.