The present invention is related to network communications. More particularly, the present invention is related to methods and program products for achieving reduced effects due to packet losses.
Packet network communications are often performed using a packet switched protocol, in which data is packaged into discrete packets that are then transmitted individually across the network. Each packet is individually addressed to one or more recipient addresses, and may also include routing instructions for traversing the network. A widely used example of a packet based network communications protocol includes Internet Protocol (xe2x80x9cIPxe2x80x9d), which may be used for real-time interactive communications, for instance, over the Internet. Because of the widespread and relatively low cost access to the Internet, IP has become a popular protocol for use in a variety of applications.
By way of example, real-time communication of voice, video, and music data over networks is often accomplished in a packet-switched protocol such as IP. Generally, these applications involve coding a temporal stream of real-time input data such as spoken words, music, or a video clip into digital format, packaging into discrete packets, transmission over the network, reception at a receiver site, unpacking, and decoding back to its original voice, video, or music format. Many particular algorithms for accomplishing these tasks are known.
One type of algorithm for coding and decoding speech data is known generally as a linear predictive coding. Linear predictive coding analyzes the input speech data to build a model of the speech. It is advantageous in that it combines relatively high quality performance with low bit rate requirements. Since speech is simply the acoustic wave that is radiated from lips when air is expelled from the lungs, adjusted at the glottis, and passed through the vocal tract, a linear predictor speech coding method decouples vocal-cord vibration or turbulence in the glottis with the frequency shaping effect of the vocal tract. The vocal-cord vibration or turbulence in the glottis is usually referred to as excitation, or residue, whereas the vocal tract is modeled as a linear predictor. Because speech signals vary with time, modeling is done on short chunks of the signal, called frames.
A numerical model is built to model the vocal tract and the residue within each frame. The model itself in linear prediction coding uses a linear prediction analysis to predict each sample as a linear combination of previous samples. Line spectral pairs (xe2x80x9cLSPsxe2x80x9d), often coded as a vector, are an equivalent representation of the coefficients of the model""s linear equation. Speech signals are also characterized by additional parameters that may generally be referred to as excitations. Excitations may be coded in any of a number of formats, with an example being stochastic and adaptive codewords used in Federal Standard 1016 CELP (Code Excited Linear Predictor).
Regardless of the particular algorithm used in packet-based communication to code speech, music, or video data, the decoded data is subject to errors and low quality due to packet loss or delay. Packet loss or delay occurs when packets of data are lost between the transmission point and the reception point, or are delayed such that they arrive beyond the time they are supposed to be used. As used herein, it will be understood that the term xe2x80x9cpacket lossxe2x80x9d refers to both packet loss and delay. Packet loss may be a particularly acute problem on networks with no quality of service guarantee such as the Internet. Packets may be lost in a xe2x80x9crandomxe2x80x9d pattern where individual packets tend to turn up missing, or in a xe2x80x9cburstyxe2x80x9d pattern where sequences of packets are missing.
In the case of voice, video, and music transmission, packet loss problems are particularly disadvantageous. They can result in what is perceived as a xe2x80x9cchoppyxe2x80x9d or xe2x80x9cjumpyxe2x80x9d decoded stream to the end user recipient, with pauses, frozen frames, missing frames, and the like encountered. In addition, many coding algorithms utilize temporal differences to achieve coding efficiency, thereby introducing sequential packet-to-packet dependencies into the packets transmitted. Packet losses under such circumstances can be particularly troublesome.
Methods for preventing packet loss and otherwise alleviating packet loss effects have been proposed. For example, highly reliable networks can be developed with relatively low rates of packet loss. Such networks, however, tend to require expensive resources to build and maintain to the extent that they are often not practical, especially in a wireless environment. Other proposed solutions involve introducing redundancies to transmission, wherein data is transmitted multiple times so that if one packet is lost the data may be recovered from a second. Again, however, these solutions may require substantially increased resources in the form of costly additional bandwidth and resources. Priority based methods assign different priorities to different packets, with the highest priority packets given more resources in the network in order to have a higher chance to travel to the destination. Even these high-priority packets are subject to loss and delay, however. Further, these methods require an underlying network protocol having priority level support.
Other proposed methods include those that require some real time cooperation between sender and receiver. If a packet is lost, the receiver notifies the sender so that the packet may be resent. Such schemes may not be useful, however, with real time interactive communications because of the time limitations on data delivery. Also, reliable communications between the receiver and sender are often not available, making such feedback based methods difficult.
Still additional methods for alleviating packet loss/delay problems focus on xe2x80x9cconcealingxe2x80x9d loss by re-constructing missing packets at the receiver. Some of these algorithms use information based on preceding and/or succeeding packets to estimate the contents of the missing packet. Other algorithms fill missing packets with xe2x80x9cbackgroundxe2x80x9d data, with an example being inserting white noise or static, or replaying previously received packets in the case of voice data. Clearly, such practices can be disadvantageous for speech, music, or video as the data in an individual packet does not always flow xe2x80x9csmoothlyxe2x80x9d from the preceding to the succeeding, and is therefore often difficult to accurately reconstruct.
One type of packet loss concealment method is known as multi-description coding (xe2x80x9cMDCxe2x80x9d). MDC divides data into equally important streams so that either stream may be decoded for acceptable replay of the input data and that more streams will result in higher decoding quality than one stream. A straightforward way to implement MDC is sample-based MDC that relies on strong inter sample correlations. Interleaving is used to distribute adjacent samples (e.g., sequential odd and even) of voice, video, or music data to different packets, as opposed to a sequence of adjacent samples being packaged in the same packet. MDC accordingly results in much smaller xe2x80x9cgapsxe2x80x9d being left to fill when a packet is lost, with interpolation of samples in packets received that is much more effective. Examples of MDC can be found in U.S. Pat. No. 6,215,787 to Kovacevic et al., U.S. Pat. No. 6,163,868 to Kondo et al., both of which are hereby incorporated by reference.
The use of MDC for LP coding of data, however, has been to date limited and not well refined. Many problems remain unresolved. Experiments performed to investigate the use of traditional sample-based MDC with LP coders for voice, for instance, resulted in poor results. In order to provide multiple descriptions using many MDC methods of the prior art, increased bandwidth was disadvantageously required. As a proposed solution, the sampling rate was decreased so that the bit rate remained constant with the pre-MDC coding scheme. These proposed solutions, however, met with limited success in that resulting speech quality was degraded due to the reduced sampling rate.
Unresolved needs in the art therefore exist.
The present invention is directed to a method and a program product for organizing data into packets. An embodiment of the present invention comprises the steps of dividing the data into a plurality of frames, with each frame being described by at least a first and a second parameter. The second parameter has a high correlation from parameter to parameter. The first parameter for each of the frames is placed in a first and a second data packet, while the second parameter is interleaved into the first and second data packets.
An additional embodiment of the invention comprises additional steps of communicating the data packets over a packet switched network, receiving the communicated packets, and extracting the first parameter from only one of the first or second packets, and extracting the interleaved second parameters from both of the first and second packets. If one of the first or second packets is discovered to be lost, the first parameters may be replaced using the first parameters from the other of the first or second packets in the sequence. The second parameter may be reconstructed using at least the second parameter from the other packet in the sequence, as the second parameter has a high correlation.
Through embodiments of the invention, then, data such as voice, video, music or the like are communicated in a manner that substantially increases the quality of the data under conditions of lossy networks. Otherwise unresolved problems in the art are thereby solved.