Conversation Analysis (CA) is a branch of linguistics which studies the way humans interact. Since the invention is based on an understanding of interactions between participants in conversations, and how the quality of the interactions is degraded by transmission delay, we first note some of the knowledge from Conversation Analysis.
In a free conversation the organisation of the conversation, in terms of who speaks when, is referred to as ‘turn-taking’. This is implicitly negotiated by a multitude of verbal cues within the conversation and also by non-verbal cues such as physical motion and eye contact. This behaviour has been extensively studied in the discipline of Conversation Analysis and leads to useful concepts such as:                The Turn Constructional Unit (TCU), which is the fundamental segment of speech in a conversation—essentially a piece of speech that constitutes an entire ‘turn’.        The Transition Relevance Place (TRP), which indicates where a turn or floor exchange can take place between speakers. TCUs are separated by TRPs.        
These processes enable the basic turn-taking process to take place, as shown in FIG. 1, which will be discussed in more detail later. Briefly, as a TCU comes to an end the next talker is essentially determined by the next participant to start talking. This can be seen for a well-ordered three-participant conversation in FIG. 8 (also discussed in more detail later). All changes in talker take place at a TCU, though if no other participants start talking the original talker can continue after the TCU. This decision process has been observed to lead generally to the following conference characteristics:
(i) Overwhelmingly, only one participant talks at a time.
(ii) Occurrences of more than one talker at a time are common, but brief.
(iii) Transitions from one turn to the next—with no gap or overlap—are common.
(iv) The most frequent gaps between talkers are in the region of 200 ms. Gaps of more than 1 second are rare.
(v) It takes talkers at least 600 ms to plan a one-word utterance and somewhat longer for sentences with multiple words. Combining this figure with the typical gap length implies that listeners are generally very good at predicting an approaching TRP.
Significantly, it is noted here that transmission delay on communication links between the respective conference participants can severely disrupt the turn-taking process because the identity of the next participant to start talking is disrupted by the delay.
Referring to prior art documents, U.S. Pat. No. 7,436,822 (Lee et al) relates to methods and apparatus for estimating transmission delay across a telecommunications network by performing a statistical analysis of conversational behaviour in the network. Certain characteristic events associated with conversational behaviour (such as, for example, alternative silence events, double-talk events, talk-spurt events and pause in isolation events) are identified and measured. Then, based on the proportion of time that these events occur, an estimate of the delay is calculated using a predetermined equation. Illustratively, the equation is a linear regression equation which has been determined experimentally.
United States patent application US2012/0265524 (McGowan) relates to methods and apparatus for visual feedback for latency in communication media, in particular for visualising the latency in a conversation between a local speaker and at least one remote speaker separated from the local speaker by a communication medium.
U.S. Pat. No. 8,031,857 (Singh) relates to methods and systems for changing communication quality of a communication session based on a meaning of speech data. Speech data exchanged between clients participating in a communication session is parsed. A meaning of the parsed speech data is determined for identifying a service quality indicator for the communication session. An action is performed to change a communication quality of the communication session based on the identified service quality indicator.
European patent application EP1526706 (Xerox Corporation) relates to methods of communication between users including receiving communications from communication sources, mixing communications for a plurality of outputs associated with the communication sources, analysing conversational characteristics of two or more users, and automatically adjusting floor controls responsive to the analysis. It refers to turn-taking analysis in the context of some versions, this being proposed in order to identify, in the context of a “primary meeting” in which there are active subgroups each of which maintains a conversational ‘floor’, which sub-group a particular talker belongs to, and who is talking with who.
United States patent application US2014/078938 (Lachapelle et al) relates to techniques for handling concurrent speech in a session in which some speech is delayed in order to alleviate speech overlap in the session. A system receives speech data from first and second participants, and outputs the speech of the first participant. The system outputs the speech of the second participant in accordance with an adjustment of the speech of a participant of the session when the speech of the second participant temporally overlaps less than a first predetermined threshold amount of a terminal portion of the speech of the first participant. The system drops the speech of the second participant when the speech of the second participant temporally overlaps more than the first predetermined threshold amount of the terminal portion of the speech of the first participant. The system may adjust the speech of a participant of the session by delaying output of the speech of the second participant.
Japanese patent application JP2000049948 relates to a speech communication technique which aims to enhance the operability of a communication system such as a telephone conference system and a speech device by facilitating the recognition of the voice of an opposite party who is a centre of a conversation.