In voice conferencing systems, where the transport of audio or voice is mediated other than by a direct proximate acoustic coupling, the participants will experience an increased delay in round trip communication. Typically, in telecommunication systems, this can be of the order of 200-500 ms in each direction, and is known as ‘mouth to ear’ delay. This is known to have an impact on communications and functional use of such systems. ITU (ITU-T G.114 2003) sets out details of the observed impact, under different functional activities, of increased link latency. Even in simple one-to-one mediated conversations, the latency can have a substantial impact. In some cases, where long distance or adverse network conditions are experienced, typical latencies can exceed the critical threshold of 400 ms set out in ITU-T G.114 2003. For example, when using an IP network, typical latencies across the Atlantic network may be 200 ms, and in addition to this time there will be necessary and additional system delays associated with buffering, central servers, jitter buffers, software systems at the end points and hardware or low level audio subsystems. Even for a well-designed system, these additional latencies can add up to 100 ms plus whatever time is required for the desired robustness to network jitter.
One of the main problems that is manifest from this latency, is the increased probability that both parties will commence speaking within the one-way delay time, and then the time taken for this to be realized and then for one or both parties to back off. This problem has an impact on ‘natural turn-taking’ and causes delays, stutter, and inefficiency in the communications flow. This problem can be understood with reference to FIG. 1, which diagrammatically illustrates the negative consequences of latency on conversational flow in a three-party situation.
As seen from the upper portion of FIG. 1, parties A, B, and C are participants in a video or telephone conference that is managed by a server 102. Each of the parties A, B, and C has a two-way communication (video and/or voice) channel open with server 102 during the conference, with communications between all the parties thus passing by way of the server. The server 102 thus sends all incoming audio out as would be expected of a single acoustic space.
The lower portion of FIG. 1 depicts a timeline in which an example of transmissions (TX) and receptions (RX) by the parties A, B, and C, and the server S are indicated. We see that A begins a transmission at time 104 which, due to the latency, is not received by B until a time 106. In the meantime, not yet aware of A's transmission, B begins a conflicting transmission at time 108. B discovers the conflict when B first receives A's transmission, at time 106, and ceases transmitting at 110 in response. Similarly, A only discovers the conflict at 112, when B's transmissions first reaches A because of the latency. At that time, 112, A also ceases transmitting. Both parties pause, at 114a and 114b, and then, unhappily, begin re-transmission at substantially the same time, 116a and 116b, starting another collision cycle. For completion, the reception of A's and B's transmissions by third party C are shown in the timeline, at 118, as are A's and B's transmissions, as emanating from the server S, at 120.
This collision-pause-re-collision problem also extends, in a more technical sense, to the use of a single media by multiple packetized data communications networks separated by some reasonable physical delay. Whilst the delays associated are much lower, with small packets and moderate-sized electrical or RF networks, the principle is the same. When a collision occurs, both parties must back off and attempt a retransmit in order to achieve reliable communications. A problem arises when the time an endpoint waits before trying again is highly correlated with the time for the other end. This causes repeated collisions. A solution for this is known as the ALOHA protocol, in which the end points wait a random interval before attempting to send again. This lowers the chance of a subsequent collision. If the end points share the same random distribution of waiting times (typically uniform distribution to minimize the chance of repeated collisions), then this system is fair and moderately efficient without requiring any arbitration.
In both the communications networking, and voice communications field, this problem is exacerbated by a larger number of parties to the conference. The probability of collision scales with the number of participants wishing to communicate. With a voice conference this becomes an almost certainty at some point when the latency and conference size increase, especially since most situations for potential wider response are precipitated by a request or closing from an active endpoint. Attempts to secure the single combined voice conference channel are highly correlated in time among the parties. While a protocol such as ALOHA could be adopted by users, it is human nature for some parties to abuse this by attempting to transmit again sooner. An alternative is an analogy of the structured turn taking approach. This can be evidenced on emergency services radio communications with a brief request including a priority code always transmitted as a first request to obtain the channel. A central point mediates access to the channel. While this is fair and practical, it does lower overall communications bandwidth.
Given these problems associated with communications channel latency, it is desirable to ameliorate collisions and to assist efficient and fair turn-taking. It is also desirable to improve the time to resolve collisions and achieve improved fairness without requiring an a priori agreed back-off strategy, or a token mechanism. It is further desirable to reduce the impact of collisions without permitting abuse by one or more parties, or encouraging race escalation or forced conversation entry, or otherwise negatively impacting the flow of conversation and channel efficiency when there is only a small subset of parties wishing to contribute.