The present invention relates to multimedia data processing. More particularly, it relates to the compression and network delivery of scalably formatted multimedia data, for example, still and video images, speech, and music. A major objective of the present invention is to enhance streaming multimedia applications over heterogeneous networks. In a streaming multimedia application, multimedia data is packetized, delivered over a network, and played as the packets are being received at the receiving end, as opposed to being played only after all packets have been downloaded.
As computers are becoming vehicles of human interaction, the demand is rising for the interaction to be more immediate and complete. The effort is now on to provide such data intensive services as multicast video, on-demand video, and video collaboration, e.g., video conferencing and interactive video. These services are provided across networks.
The computer networks of today and of the foreseeable future are heterogeneous. This means that the computers on the network possess varying computational power, e.g., 40 MHZ Intel 486 CPU or 150 MHz Intel Pentium CPU, on-chip media processing or none. It also means that the connections of the network can be of varying topologies, e.g., ATM, Ethernet, ISDN, POTS ("plain old telephone system"), or wireless, possessing varying bandwidth capacities.
Multimedia data consists of different kinds of data, including video images and audio signals. For each kind of multimedia data, a certain number of characteristics can be used to describe that data. For example, resolution (the amount of detail in the image) and quality (the fidelity of the image being displayed to the original image) can be used to describe still images; resolution, quality, and frame rate (the rate at which images change) can be used to describe video; and resolution (audio samples per second) and quality (the fidelity of the sample being played to the original sample) can be used to describe audio. These are not the only sets of characteristics which can be used to describe these different multimedia data types.
Multimedia is experienced by playing it. The enjoyability of multimedia playback, and therefore the usefulness, depends, in large part, upon the particular characteristics of the multimedia data. The more of a positive characteristic that the multimedia data possesses, the greater the enjoyment in playback of that data. With video, for example, playback is generally superior the higher the resolution, the quality, and the frame rate.
Multimedia data consumes space. The amount of space that data consumes depends upon the degree to which the multimedia possesses certain characteristics. With video, for example, the higher the resolution, the quality, and the frame rate, the more data is required to describe the video data. Thus, greater enjoyment of multimedia comes at the cost of greater data requirements.
Networks introduce a temporal element to data. Networks transmit data across the network connections over time. The amount of data that a network connection can transmit in a certain amount of time is the bandwidth of that connection.
The bandwidth required to transmit multimedia data over a network is a function of the characteristics of that data. With video, for example, the higher the resolution, the quality, and the frame rate, the higher the bandwidth required to transmit that video. Once the level of resolution, quality, and frame rate of video content is known, the bandwidth required to transmit that content can be calculated.
Often, bandwidth is the initial constraining factor in transmitting multimedia data. That is, the available bandwidth of a particular network connection is known. With bandwidth known, the level of the characteristics of multimedia data can, in theory, be adjusted to ensure that the data can be transmitted over the network. With video, for example, if bandwidth is known, the frame rate, resolution, and quality of that video can each, in theory, be raised or lowered to ensure the video can be transmitted over the bandwidth.
Networks transmit data across network connections to computers and other devices on the network. After multimedia data reaches a computer of the network, that computer can attempt to playback the data. In playing back that data, demands are placed upon the computational power of the computer. In general, the higher the level of characteristics of certain multimedia data, the more computational power required to playback that data. With video, for example, the higher the resolution, the higher the quality, and the higher the frame rate, the greater the computational power required to playback the video.
Often, computational power is the initial constraining factor in playing back multimedia data. That is, the available computational power of a particular computer is known. With computational power known, the level of the characteristics of multimedia data can, in theory, be adjusted to ensure that the data can be played back by that computer. With video, for example, if available computational power is known, the frame rate, resolution, and quality of that video can each, in theory, be raised or lowered to ensure the video can be played back on that computer.
In a heterogeneous network, differential bandwidth and computational power constraints preclude all network participants from experiencing the best possible multimedia data playback. In a "lowest common denominator" approach, multimedia data which can be processed by the network participant with the lowest bandwidth and computational power capabilities would be generated and delivered not only to that participant, but to all network participants. This is undesirable, however, because the network participants with greater bandwidth and computational power capabilities will receive sub-optimal data.
Alternative approaches, e.g., MPEG-1, generate separate data files, with different characteristic levels (e.g., resolution, frame rate, quality) targeted for different bandwidth/computational power capabilities. Each network participant receives near optimal multimedia data given that participant's bandwidth and computational power. This is undesirable, however, because multiple data files for each multimedia presentation consume a great deal of storage space. For systems which store many multimedia presentations, this approach quickly becomes infeasible.
The drawback of the multiple file approach is particularly apparent in the multicast case. With standard multicast, one data file or stream is transmitted by the server down a particular channel, and participants who subscribe to that channel receive that data. This minimizes the use of bandwidth because only one copy of the data traverses the network. If multiple redundant files or streams are used to transmit multimedia data down multiple multicast channels, bandwidth will be wasted, contrary to the very purpose of multicast.
Still other approaches provide limited scalability in a single file or stream approach. For example, in the case of video, quality scalability may be provided, but not frame rate or resolution scalability. These approaches are sub-optimal in that network participants do not have full flexibility is tailoring the video presentation to their needs and desires.
In addition, where a compression system provides for scalability, complexity is often introduced which compels modifications to existing compression techniques. In particular, introducing frame rate scalability compels modifications to inter-frame compression techniques. One such inter-frame compression technique is conditional replenishment ("CR"). Another is motion compensation ("MC").
CR is an inter-frame video compression technique well known in the art. CR, like all inter-frame compression techniques, is used to achieve higher compression ratios. CR operates upon blocks of adjacent frames. A block is a contiguous subset of a frame. CR determines whether a block in the current frame should be encoded. "Forward" CR makes this determination by comparing the current block against the similarly positioned block in a previous frame. On the "condition" that the difference between the two blocks is less than some predetermined threshold value, the current block is not encoded. Instead it is "replenished" from the previous block. "Reverse" CR compares the current block against the corresponding block in a subsequent frame.
MC is another inter-frame compression technique well known in the art. MC can be considered a more general case of CR. Whereas forward CR compares the current block against only the corresponding block in a previous frame, forward MC compares the current block against more than one comparably sized blocks in that previous frame. If a matching block is found in the previous frame, MC generates a vector indicating the direction and distance describing the motion of the previous block, and error data describing changes in the previous block. MC can also operate at the frame level, as opposed to block level. As with reverse CR, reverse MC involves analyzing the current frame against a subsequent frame.