Not applicable.
Not applicable.
This invention relates to transmission of the four CAS signaling bits (ABCD) of the extended frame of a time division multiplex signaling format over packet networks. In particular, the present invention relates to effective embedding of the ABCD bits in the RTP header of a VoIP packet. The present invention can also be applied to transport AB bits.
Voice over packet networks or VoIP requires that the voice or audio signal be packetized and then transmitted. The transmission path will typically take the packets through both packet switched and circuit switched networks between each termination of the transmission. The analog voice signal is first converted to a digital signal and compressed. The conversion and compression can be accomplished at a single gateway connected between a terminal equipment and the packet network or can be performed separately. A pulse code modulated (PCM) digital stream from the analog voice can be produced at a gateway or elsewhere.
The PCM stream is analyzed in the gateway and processed according to the parameters of the gateway, such as echo suppression, silence detection and DTMF tone detection. Detected tones are passed separately without encoding. The voice PCM samples are passed to a codec for processing prior to packet assembly.
The codec creates voice frames from the PCM stream according to the parameters of the codec used. The creation of frames from the PCM stream typically includes compression. The frames are of a set time duration and contains a set number of bits of the PCM stream.
The frames are then assembled into packets by a packet assembler which combines a set number of sequential frames into a single packet. A real time protocol (RTP) header is attached to each packet to provide a sequence number for identification of the packet and a time stamp for the packet. The gateway then determines the IP address corresponding to the designated destination of the voice signal to which the packet belongs. A UDP header containing source and destination xe2x80x9csocketsxe2x80x9d is added to the packet. A IP header is also added to the packet to designate the origination and destination IP addresses for the packet.
The packet is routed through the packet network based upon the IP address information. The packet may pass through several switches and router and may pass through packet switches. The packet may travel through more than one PSTN and may experience robbed bit signaling. The packet will also accumulate delay as it passes between the near and far end terminal equipment, through the near and far end gateways through the packet and PSTN networks and switches. The packet can alternatively travel in a large variety of alternative routes from source to destination.
Because this accumulated delay is erratic and unpredictable and further because each packet may take a different path through the networks, delay can cause the packets to arrive out of sequence and/or with gaps or overlaps. Gapping and overlapping of packets is referred to as jitter. Conditions in the packet network can also result in packet loss.
Voice packets are generated at a constant rate at the gateway from a continuous audio signal such as speech, and represent continuous and ordered speech. The packets should be played out at the receiving end in the same order and at the same rate to accurately reproduce the original analog speech. Because of some inherent loss and delay in a packet network, the packets are reassembled and played out as close to the original order and sequence as possible to achieve acceptable reproduction.
The receiving gateway will first remove the IP and UDP headers from the packets. Next the RTP information is read and the voice frames extracted from the packet. The RTP information is used to ensure that the frames are in the proper order. If a packet is missing, or out of order, the gateway must compensate for the missing frames in that packet in order to avoid undesirable distortion of the voice signal after frame reassembly. If one or more frames in a sequence are missing, the previous frame is repeated at a decreased volume to fill in the gap(s) left by the missing frame(s). If the missing frame subsequently arrives, too late for inclusion in the reassembled sequence of frames, the packet is discarded.
In order to compensate for jitter, the receiving gateway utilizes the sequence and time stamp of the RTP header to smooth the playout by compensating for jitter and/or packet loss by removing gaps and overlaps in the frame sequence.
The reassembled sequence of frames is processed in a codec to return the PCM stream for playout.
The present invention teaches a method for transporting signaling bits by including the bits in a voice stream header that is being encapsulated by the Real Time Protocol (RTP). This method has several advantages over alternative techniques, especially when voice is being transmitted over a cable data network. The present invention also has the advantage in a network where the bandwidth on the packet network is limited and is pre-allocated.
The present invention has the advantage that the packet requirement is predictable, allowing for network management and pre-allocation of packets. With the present invention, ABCD bits are generated only as often as a packet is sent. The present invention eliminates the generation of extra packets simply for the transmission of ABCD signaling and eliminates the displacement of voice data by packets dedicated to ABCD bits. Because bits are transmitted with each voice packet, they are transmitted at a regular rate. If ABCD bits require a separate packet generation, repeated activity which generates ABCD signaling, such as on-hook/off-hook activity will generate extra bits, changing bandwidth requirements. With the present invention, generation of extra ABCD bits, such as repeated on-hook/off-hook activity, does not change bandwidth requirements.
RTP is the protocol of choice for encapsulating packet voice. With RTP, digital voice samples are collected, possibly compressed (if a compression codec is used), and packetized. Typically 10 milliseconds or more of voice is collected into one packet. An RTP header is added to the voice payload. This header is typically 12 bytes as illustrated in FIG. 2, and includes the following fields: protocol version, source identifier, timestamp of the voice samples in the packet, the type of voice payload in the packet (i.e. codec being used), the packet sequence number and CCxe2x80x94Contributing sources (in the case of a voice conference).
The present invention recognizes several alternative schemes to transport ABCD signaling bits in parallel with the RTP voice stream:
The first approach is to use a separate RTP payload type, whereby the ABCD bits are placed into a packet, an RTP header is attached with a payload type indicating that the packet contains ABCD signaling bits and not voice. This packet can then be inserted into the RTP voice stream and transported to the other side. These ABCD relay packets need only be transmitted when the ABCD state changes.
A second approach is to embed the ABCD bits in the user-defined areas of the RTCP packets. This approach is similar to the first approach, except that it would use the RTCP channel to send and receive the ABCD bits whenever an ABCD transition occurs.
A third approach is to use the signaling protocol to transport these bits.
Each of these approaches is feasible, but each has some disadvantages in many transmission environments, such as direct PSTN connectivity in a cable data network environment.
The present invention teaches a preferred embodiment for transporting ABCD bits using RTP in a packet network. The preferred exemplary method taught herein is to borrow bits from unused or inapplicable fields or to borrow bits by restricting portions of header fields in the RTP header of each voice packet. In transparent or non-switched mode, RTP voice streams will only have two participants, i.e., the called and calling parties. No conferencing will be present so that the CC field in the RTP header (see FIG. 2) will be 0, i.e.; there will never be any additional contributing sources.
The CC field is 4 bits long and thus can be used to hold a snapshot of the current ABCD state of the endpoint. With this approach, ABCD bits will be transmitted at the voice packet rate, e.g., 10 ms voice will generate ABCD samples at a 10 ms rate as well. In a T1 line, ABCD bits for a voice channel are generated every 3 ms. If this resolution is required, additional bits can be squeezed out of the RTP header by restricting the SSRC field (normally 32 bits), to fewer bits. For 3 ms resolution, the SSRC field would be restricted to 32xe2x88x928=24bits. The borrowed 8 bits of the SSRC field are then used to hold two additional ABCD samples, giving a total of 3 ABCD samples in one 10 ms voice packet, thus approximating the T1 resolution. This approach again takes advantage of the point-to-point-calling environment; wherein the 32-bit space of the SSRC field is used to support large conference applications. This approach can be extended to E1 applications, where ABCD bits are sampled every 2 ms. In this case the SSRC field would be restricted to 16 bits, and each 10-ms-voice packet would hold 5 ABCD samples.
As a practical matter, the ITU recommendation 1.366.2, Annex L recommends 5 ms ABCD samples (during a period of ABCD transition). This sample rate can be handled by restricting the SSRC field to 28 bits.
There are several advantages to this approach over others in the voice over cable networks. The major advantage is that the ABCD bits, because they are transmitted as part of the voice stream, can make use of the same quality of service mechanism as the voice. In addition there is a bandwidth savings based on the inclusion of ABCD within packets already allocated. This is of particular importance in a voice over cable application, where the cable data network is using the DOCSIS unsolicited grant service for upstream data transmission. With this service, upstream bandwidth in a DOCSIS network is committed at call set up time. For example, a call that is set up to use 10 ms G.711 voice, will request sufficient upstream transmission slots to transmit 92 bytes (80 bytes of voice samples plus 12 bytes of RTP header) every 10 ms (assuming DOCSIS payload header compression is enabled). These slots will then be granted to the endpoint every 10 ms with minimal jitter to ensure voice quality. The issue with using the ABCD RTP relay approach, the first approach discussed above, is that the inserted ABCD packets will occur at a variable rate, thus forcing some kind of scheme to acquire the additional upstream bandwidth. This will be difficult because of the fact that these ABCD packets will only be generated when ABCD transitions are observed, making any polled or unsolicited upstream service impractical. Thus, like the RTCP approach, the second approach described above, a best effort DOCSIS upstream service would be required.
This approach also has the advantage that no additional resources are required to transmit the ABCD bits as they replace unused bits in the RTP header.
The implementation of the approach taught in the present invention must accommodate conditions which would limit the transmission of ABCD signaling, in the manner taught herein. In voice over IP, silence (periods without voice) are often suppressed (no packets are transmitted) in order to save bandwidth and processing. Suppression of silence is referred to as VAD. When silence is detected for a predesignated threshold period of time, no packets are transmitted until voice is again detected. If silence suppression is enabled, the approach of the present invention, embedding ABCD signaling in the RTP header, has the disadvantage of the lack of generation of ABCD bits during silence periods. The present invention teaches that if an ABCD transition is detected during a silence period, a Silence Packet is generated and ABCD bits are inserted into the RTP header. The silence packet is then transmitted in lieu of a voice packet. The method of the present invention also teaches that in the event that the ABCD bits have not changed no silence packet is sent.
If large voice packet sizes are used, e.g. greater than 10 ms of voice, then ABCD samples will be forwarded less frequently. For example, with 40 ms voice, and assuming one ABCD sample per packet, only 25 ABCD samples will be transmitted per second. In certain applications, this might not be acceptable. Stealing SSRC bits will offset this problem up to a point, but in general this approach should be restricted to cases where the packet size is less than 60 ms. This can also be used for redundancy in the event of lost packets.