1. Field of the Invention
This invention relates to the field of computer-mediated group communication systems.
2. Background
Groups of people have communicated together for eons. This communication includes styles where a group of people listen to a presenter as well as styles where people collaborate in a social interaction such as a meeting (among other styles). In the following description, the term meeting is intended to include all social interactions. Meetings often have subgroups of people who carry on separate conversations within the context of the meeting. Each of these subgroups maintains a conversational floor for that subgroup while the members of the subgroup maintain awareness of the primary group conversation. The primary group conversation generally continues even though separate conversational floors are established. While this conversational style works well when the number of participants is small and all of the participants are co-located (such as in a conference room), it is completely disruptive using existing technology that enables remote parties to communicate with each other (for example, teleconference technology, two-way shared radio channels, etc.).
An example of this problem is that of a “party line” telephone or teleconference call in which there is no capability to schism the conversation into separate conversational floors. This is also true of shared-channel radio systems such as police and fire-fighter radio communications. Communications between the participants are mixed together on the communication channel making it difficult for arbitrary users to communicate and often requiring complex protocols among the users to provide some order to the communications. Although some high-end teleconference systems support multiple conversational floors (for example, by “sub-conferencing” or by channel switching) the establishment and modification of these conversational floors is difficult. This difficulty lessens the spontaneity of establishing a conversational floor in a remote meeting.
Instant messaging and chat systems allow for schisming as a number of users can participate in a textual “chat room” where each user's typed message is displayed to all the members of the room (subject to per-member controls). Each user can also explicitly create and maintain a side-channel textual conversation with a subgroup of members. This schisming is not automatic but requires that explicit user commands be made to the system.
U.S. Pat. No. 6,327,567 B1 to Willehadson et al., entitled Method and System for Providing Spatialized Audio in Conference Calls, and filed Feb. 10, 1999 teaches a mechanism that allows sub-conferences (conversational floors). However, each user needs to manually enter a dialog with a command unit (by DTMF, by a user interface with a control unit or by a recognized voice command) to initiate or participate in a sub-conference or to switch between sub-conferences. In addition, Willehadson uses a complicated spatialization technique to indicate what sub-conferences are available. Willehadson does not teach automatic detection of conversational floors or automatic switching between conversational floors responsive to the conversational characteristics related to the conversations.
In addition, during the course of a social interaction, the suitability of the communication means currently being used by the participants may change due to one or more factors. For instance, the environment of one or more of the participants may change due to, for example, ambient noise or interference. Similarly, the communication channel itself could be effected by noise or interference, as well as physical limitations of capacity or media. Explicit inputs by the participants could also effect the suitability of the communication channel, such as when a participant turns down the gain control on the communication device. In addition, the content of the social interaction, and the inferences that can be drawn based on the content, can signal a possible need to modify the communication channel in some fashion.
Various psychological, sociological and related factors can affect the ability of a participant to effect a change to the communication channel, even if the participant is cognizant that such a change is required. For example, a pair of participants conversing using two-way radios equipped with a push-to-talk transmission mode may become highly engaged conversationally, such that the limits of the communications devices will be exceeded if they wish to continue their discussion. Preferably, the participants would agree to a “media switch,” that is, the substitution of a communication channel exhibiting properties more suited to the current needs of the social interaction; in this case, the participants would switch to conventional telephony and resume the exchange on a new communication channel. However, agreement to a media switch also requires investing in transacting a new communication channel and implies a social commitment to continue the conversation but at a higher level of engagement and perhaps significantly more lengthy interaction. Conversely, participants in a social interaction could be unaware of the need to make a media switch and could inefficiently muddle through their social interaction.
A communication channel has an associated set of channel properties, which substantially determine the structure of the information or content being delivered through the channel. Qualitative channel properties consist of binary or categorical parameter settings, whereas parametric properties consist of substantially continuous parameter settings. Channel properties are distinguishable from other aspects of the communication channel that might be changed, but which do not have the same kind of effect. For example, a communication system can incorporate indicators that augment the communication channel without substantially altering the structure of the information delivered in the channel.
Systems that support speech-triggered automatic actions with respect to parametric properties are known. For example, automatic gain controls are widely used in audio teleconference systems. These controls typically adjust the microphone gain dynamically to normalize gain across the participants based on an assumption of low gain variability over time. As another example, so-called speaker-select mechanisms are widely used in teleconference systems. These systems implement automatic speaker-select algorithms, which attempt to track which participant or participants in a teleconference are speaking at a given moment and enable only a limited number of people to speak concurrently. Various techniques for automatic speaker-select for audio conferences are described in U.S. Pat. No. 3,508,007 to Goodall (first-to-speak); U.S. Pat. No. 3,699,264 to Pitroda (loudest speaker); and U.S. Pat. Nos. 4,475,190 and 5,631,967 to Marouf and Wagner, respectively (simple statistics from the most recent talkburst), the disclosures of which are incorporated by reference. For audio, automatic speaker-select addresses the following problems: (1) reducing costs—compressed digital audio must be decoded before mixing and recoding prior to being put back on the network; (2) preventing numeric overflow—may occur where multiple signals with high amplitude are mixed; (3) reducing bandwidth consumed—savings result from sending only n streams instead of n×n streams; and (4) reducing echo—caused by speaker-to-microphone feedback when speakerphones are used. However, these systems fail to primarily facilitate social patterns of human communication and fail to automatically change qualitative channel properties. Similarly, automatic speaker-select for video conferences is described in E. J. Addeo et al., “An Experimental Multi-Media Bridging System,” Proc. ACM Conf. on Office Info. Systems, ACM Press, 1988, 236-242; and U.S. Pat. No. 5,768,263 to Tischler, the disclosures of which are incorporated by reference. For video, automatic speaker-select addresses the following problems: (1) reducing costs—enabling “multi-party” video without needing to have either n video displays or m (<n) displays with video multiplexing hardware; and (2) reducing bandwidth consumed. Automatic speaker-select techniques can also be used to add indicators to a conferencing system, such as described in R. Cutler et al., “Look Who's Talking: Speaker Detection Using Video and Audio Correlation,” Proc. IEEE Conf. on Multimedia & Expo (ICME), IEEE CS Press, 2000, 1589-1592, the disclosures of which are incorporated by reference. In the Cutler device, rectangular outlines are drawn around the current speaker's video image. It is suggested that this will help in understanding which participant is currently talking. However, these systems fail to automatically change qualitative channel properties.
Systems that include manual user interfaces for controlling media streams with respect to parametric properties are also known. Manual interfaces can control various parametric properties, which include audio source select. Audio conferencing systems that allow participants to manually specify which audio streams they will hear, through a form of a simple mixing function, which constitutes a simple parametric property, such as, selecting specific participants to hear, are described in E. J. Addeo et al., “An Experimental Multi-Media Bridging System,” Proc. ACM Conf. on Office Info. Systems, ACM Press, 1988, 236-242; U.S. Pat. No. 5,034,947 to Epps; U.S. Pat. No. 6,236,854 to Bradshaw; U.S. Pat. No. 5,113,431 to Horn (full manual mixing of all n participants' audio); and U.S. Pat. Nos. 5,533,112 and 6,178,237 to Danneels and Horn, respectively, the disclosures of which are incorporated by reference. In addition, conferencing systems that enable participants to use alternative manual means to indicate the audio streams they will hear, such as through explicit selection of one of several groups in a video conference by directing eye-gaze toward the image of a member of that group, are described in R. Vertegaal et al., “GAZE-2: Conveying Eye Contact in Group Video Conferencing Using Eye-Controlled Camera Direction,” Proc. ACM SIGCHI Conf., ACM Press, 2003, 521-528), the disclosure of which is incorporated by reference. The end-result is still a mixing function, which constitutes a simple parametric property. Systems that provide audio speed select to allow control using manual means of additional parametric properties, such as time-scale compression or “speeded-up audio,” are described in P. H. Dietz et al., “Real-Time Audio Buffering for Telephone Applications,” Proc. ACM UIST Symp., ACM Press, 2001, 193-194), the disclosure of which is incorporated by reference. Systems that incorporate manual interfaces to control qualitative properties are known. A system for providing speech intelligibility, such as by partially-prosody-preserving speech scrambling, is described in I. E. Smith et al., “Low Disturbance Audio for Awareness and Privacy in Media Space Applications,” Proc. ACM Multimedia Conf., ACM Press, 1995, 91-97; and C. Schmandt et al., “Mediated Voice Communication via Mobile IP,” Proc. ACM UIST Symp., ACM Press, 2002, 141-150), the disclosures of which are incorporated by reference. In these systems, a particular end-user can choose whether to make the audio transmitted from their microphone intelligible or unintelligible. The algorithm used to make speech unintelligible is designed to preserve the overall tone of the speech, such as urgency, emotion, and so forth, and a listener's ability to identify the speaker, albeit imperfectly. Initiating the channel property change requires either unilateral or sequentially-negotiated end-user actions. A unilateral property change results from a unilateral action, such as when one participant pushes a button and the change occurs. A sequentially-negotiated property change follows a sequence of steps involving more than one participant, such as setting up a telephone call. A system that requires an initiating step by a first participant followed by an accepting step by a second participant, such as initiating and accepting a side conference session, is described in L. Berc et al., “Pssst: Side Conversations in the Argo Telecollaboration System,” Proc. ACM UIST Symp., ACM Press, 1995, 155-156, the disclosure of which is incorporated by reference. Such steps constitute a specific request/reply negotiation, which implies both a strong causal dependence as well as temporal ordering. However, these telecommunication systems fail to change channel properties in response to independent user interface gestures made by multiple participants.
Finally, known telecommunication systems allow alterations to the system's user interface. For example, most current software communication applications have menus and modes. Similarly, most current cellular telephone handsets with LCD displays have programmable “soft keys” whose assigned functions change depending on context, such as whether a call is in progress. Methods by which users can establish new communication channels, such as by establishing a new telephone call by directing eye gaze toward a telephone augmented with an gaze detector, are described in J. Shell et al., “Interacting with Groups of Computers,” Comm. ACM 46 (3), 2003, 40-46), the disclosure of which is incorporated by reference. By definition, establishment of a new communication channel is not alteration of an existing communication channel. Methods by which information is passed between users that is not direct communication, such as by notifying potential callers that a potential callee is likely in face-to-face conversation based on sensor input captured by the callee's handset, is described in R. Vertegaal et al., “Designing Attentive Cell Phones Using Wearable Eyecontact Sensors,” Extended Abstracts, ACM SIGCHI Conf., ACM Press, 2002, 646-647), the disclosure of which is incorporated by reference. Information is passed through indicators, rather than in a channel and so necessarily does not relate to the properties of a channel. These systems effect user interface changes that do not alter the structure of the information that passes through the channel. Systems that change the physical output device of a channel, such as by selecting from which of two speakers audio will be played, for instance, changing a telephone handset's audio from a speakerphone to an earphone depending on the proximity of the handset to the user's head, are described in Ericsson R520m User's Guide (3rd Ed.), Pub. EN/LZT-108-4268-R3A, Ericsson Mobile Communication AB, 2001, the disclosure of which is incorporated by reference. The channel contents and the user's interaction with other users, that is, full-duplex audio, remain unchanged. However, these telecommunication systems fail to allow alteration of the user interface relating to the properties of the communication channel.
It would be advantageous to provide a capability that addresses the above-mentioned problems.