1. Field of the Invention
The present invention relates to a method and apparatus for audio communication over a data network.
Conventionally voice signals have been transmitted over standard telephone lines. However, with the increase in locations provided with local area networks (LANS) and the growing importance of multimedia communications, there has been considerable interest in the use of LANs to carry voice signals. This work is described for example in "Using Local Area Networks for Carrying Online Voice" by D Cohen, pages 13-21 and "Voice Transmission over an Ethernet Backbone" by P Ravasio, R Marcogliese, and R Novarese, pages 39-65, both in "Local Computer Networks" (edited by P Ravasio, G Hopkins, and N Naffah; North Holland, 1982). The basic principles of such a scheme are that a first terminal or workstation digitally samples a voice input signal at a regular rate (e.g. 8 Khz). A number of samples are then assembled into a data packet for transmission over the network to a second terminal, which then feeds the samples to a loudspeaker or equivalent device for playout, again at a constant 8 Khz rate.
2. Description of the Prior Art
Conventional older audio communication systems comprise a central mixing hub and audio conferencing terminals connected thereto in a star network. The central hub receives audio signals from each terminal and produces a composite signal therefrom. The composite signal is then transmitted back to each terminal less that terminal's own audio signal. This is to be contrasted against LAN audio conferencing systems which have a more distributed architecture. Each terminal must receive via a data network the audio signals of all of the other terminals connected to the data network. Receiving the audio signals in parallel requires a large amount of bandwidth.
The bandwidth requirement of an audio communication system using a scheme as described above varies according to the number of users of the system. For example, in an audio communication system which encodes audio as 8 bit pulse code modulation sampled at 8 Khz: a two way audio conference requires a bandwidth of 2.times.64 kbps; a five way audio conference using individual addressing requires a bandwidth of 20.times.64 Kbps; and a five way audio conference using group addressing requires a bandwidth of 5.times.64 Kbps.
Consequently, it can be seen that the greater the number of parties to a conference, the greater the bandwidth required to implement the audio communication system. Allowing an audio communication system to utilise the available bandwidth of a LAN without restraint will have an adverse effect on the overall performance of the LAN.
In a typical two-party conversation, each party speaks for approximately less than forty percent of the total time for which the parties are connected (see "The Voice Activity Detector for Pan-European Digital Cellular Mobile Telephone Service", by Freeman et al, IEEE 1989). The audio communication apparatus used by each party conventionally picks up acoustic waves in the vicinity of a microphone associated therewith. The acoustic waves include the voice of a party to the conversation and any background office noise. An electrical signal representing acoustic waves is produced by the microphone. The signal is digitised to produce digital audio samples of the output of the microphone. The samples are then placed in, for example, packets and transmitted over the local area network to a receiving apparatus for output to the other party to the communication. As only forty percent of the samples produced by one of the parties contain voice data, it follows that only forty percent of the traffic attributable to the two-way conversation comprises voice data, the remaining packets produced end transmitted contain silence or, in an office environment, very low level background noise. GB 2 172 475 A discloses a packet switching system in which speech is packetised and voice activity detectors are used to monitor speech in the Go and Return paths. In the Go path, the voice activity detector compares the current level of packets with (a) the current back-ground noise value, and (b) the computed value of the expected echo due to speech packets in the Return path. If the Go path packet is larger than the parameter of (a) and (b) by a preset arrangement the packet is sent, otherwise it is not. If the "send" decision persists for a number of speech packets, that send condition has a hangover period attached to it. If the parameters are properly chosen, then the speech heard by a subscriber is not duly affected.
A further problem with using a LAN to carry voice data is that the transmission time across the network is variable. Thus the arrival of packets at a destination node is both delayed and irregular. If the packets were played out in irregular fashion, this would have an extremely adverse effect on intelligibility of the voice signal.