Speech may be communicated between two parties over a packet-based network 104 as shown in FIG. 1. Coupled to the network 104 are a sender machine 106 and a recipient machine 108. To communicate voice through the network 104, the voice of a first party 109 is first digitized, at a given sampling rate, into a voice stream by a voice capture device 112. The voice capture device may be part of a digital TDM link (e.g. Digital Services levels, DSx) or may be part of an analog telephone interface, e.g. subscriber line interface circuit (SLIC). The stream is then arranged into network packets by a voice processing device 114. These packets are then sent, through the At network, to the recipient machine 108 of a second party 116 where they are reassembled into the original stream and played back. These series of operations may also occur in reverse, where the sender and recipient roles of the machines: are reversed, so that a two-way conversation can be had between the parties.
Though the packets may be sent into the network 104 at a fixed rate, the packets often are not received by the recipient at the fixed rate, because the packets encounter varying delays while traveling through the network. This creates the undesirable situation at the recipient machine 108 of running out of packets while reassembling the voice stream, which undesirably introduces breaks during playback. The “jitter” worsens as the network delays become longer and more unpredictable with more complex networks, i.e. those having a large number of nodes that connect the source and recipient machines such as in a single wide area network (WAN), a collection of WANs, or in the Internet.
To alleviate jitter, buffers may be used in various nodes in the network 104 and in the end nodes, i.e. the recipient and sender machines 108,106. The buffers store some of the incoming voice data before this data is forwarded or reassembled into the voice stream, to avoid “under-runs” or running out of data in the event packet arrival is delayed. However, buffering hampers real time, two way conversations if it introduces excessive delay into the conversation. In addition, too little buffering, in view of the rate at which incoming data arrives, creates “over-runs” as the incoming data must be discarded since the buffer is full.
One way to optimize (here, minimize) buffering is to ensure that the reassembling of the voice stream at the recipient machine 108 occurs at the sampling rate used to create the stream at the sender machine 106. This may be done by giving the sender and recipient machines access to a stratum traceable clock (STC) reference 118, so that the creation and re-creation of the data bytes that constitute the voice stream occur at the same rate. As shown in FIG. 1, the STC reference may be derived at the sender and at the recipient machines, which might be located in different states or countries, based upon a received radio frequency (RF) reference signal that is generated and broadcast over the airwaves. Note that this is just one of several ways to supply the STC reference. In some systems, the stratum clock is recovered from an uplink cable.
Each machine may be designed to handle two or more conversations, also referred to as voice channels, simultaneously. The architecture in FIG. 1 shows that each machine is equipped with a time division multiplexed (TDM) bus 0.120 that can transport multiple voice streams (multiple channels) between the voice processing device 114 and one or more voice capture devices 112 that are connected to the bus 120. Bus timing, and therefore the rate at which portions of each voice stream are placed on the bus or retrieved therefrom, is based upon a TDM bus clock signal that is derived from the STC reference 118. Matching the TDM bus clocks in the sender and recipient machines 106, 108 helps minimize any rate mismatch between the supply and consumption of digitized voice data, in the sender and recipient machines, respectively, so that the voice stream is transported over the TDM bus 120 in the recipient machine 108 at the same rate as the original voice stream was transported over the TDM bus in the sender machine 106. This helps improve poor voice quality, due to glitches and/or drops heard in the conversation, that may be caused by a rate mismatch.
The architecture in FIG. 1 illustrates a block diagram of the voice processing device 114 which may essentially be duplicated in the sender and recipient machines. The voice processing is performed by (1) host hardware/software 128 which may be based upon a processor-memory combination that runs a real time operating system (RTOS) and receives from and delivers voice packets to the network 104, (2) a digital signal processor (DSP) system 132 for efficiently performing compression/decompression and channel processing, such as echo cancellation, on voice packets, and (3) a TDM controller chip 136 for conducting direct memory access (DMA) between host memory and the DSP system 132. Examples of such a controller are the QUICC multi-channel controller for the MC68360 controller by Motorola, Inc. and the 8474 MUSYCC chip by Brooktree Corp. The TDM chip 136 adjusts the timing for transporting voice data between the host memory and a buffered serial port of the DSP system 132, so that the delivery and pick-up of the voice stream at the TDM bus 120 occurs at essentially the same rate as their counterparts in the sender machine 106. Note that the TDM bus clock which is used to transfer voice data for a number of channels in a time multiplexed manner, is also derived from the STC reference, for instance using a clock recovery/phase locked loop (PLL) 138.