1. Field of the Invention
This invention relates to the field of speech recognition interfaces of computer apparatus and the like, and in particular, to conversation management in such speech recognition interfaces.
2. Description of Related Art
One goal of a computerized interview (clinical assessments, structured interviews, and other individualized indicators) is to maintain the quality of the replaced human-to-human contact. During the interview, the interviewer plays different roles, e.g. test administrator, tester and observer, and the client must understand when the roles change. In human face-to-face interview, the verbal, situational, and paralinguistic cues generally suffice for a smooth transition among the different roles for the interviewer and client. While the rules for conversation are known (although they are difficult to express) to the conversants in a face-to-face dialogue, they are not for face-to-interface dialogues. The "rules" or "etiquette" for a computerized interview have not been established. There are two problems in particular which usually occur in a computerized conversation, namely: when to talk, referred to as the turn taking problem; and, how to talk, referred to as the vocabulary problem.
Persons do not know when to talk in a computerized conversation. A computerized conversation is not like a face-to-face conversation in which the conversants use paralinguistic cues, for example pitch changes and tone, and nonverbal cues, for example, facial expressions, to indicate when it is appropriate for the other person to talk. Moreover, many computer systems do not understand interruptions. In a face-to-face interview, the client can interrupt the interviewer at any time to ask for clarification or to maintain the conversation. This will be a problem until natural language programs can be used effectively in a conversation.
Persons do not know how to speak in a computerized conversation. Speaking to a voice recognition system is not like a face-to-face conversation in which the language has few constraints. On the other hand, generally, in a face-to-interface interview, the speaker will have to be trained how to speak. Sometimes the speaker must speak discretely, but, even with continuous speech, the vocabulary is limited.
Systems that administer tests are not new, however, the additional component of a conversational interview is new. Some kiosks have interactive sessions but they do not generally use voice recognition and don't attempt to initiate a conversation. When a video environment is used in a kiosk interaction, the end user makes choices from a touch screen or other type of selection button. Additionally, kiosk interaction is typically kept as short as possible. Part of the reason for that brevity may be that people tire relatively easily of that style of interaction.
The IBM.RTM. Human Center enables conversational computing. An actor's output and recognition can be programmed through the Personality Services and Actor Services components. Even so, the IBM.RTM. Human Center does not address what should be in the dialogue or how to manage the conversation.
Finally, there is a large body of research into non-verbal communication and discourse analysis which is pertinent to this field. Reference may be made to: Druckman, D., Rozelle, R. M., & Baxter, J. C., (1982). Nonverbal Communication: Survey, Theory and Research, Sage Library of Social Research (139), Beverly Hills: Sage Publications, Inc.; and, [2] Reichman, R. (1985). Getting Computers to Talk Like You and Me, Cambridge, Mass.: The MIT Press.