1. Field of the Invention
The current invention is directed toward encoding of multimedia data and, in particular, to forming of MPEG-4 multi-media data packets.
2. Discussion of Related Art
There is great interest in developing techniques and standards for efficient transmission of multimedia data. One popular standard is MPEG-4. MPEG-4 is an ISO/IEC standard developed by the Motion Picture Expert Group (MPEG) and is used for both fixed and mobile users. MPEG-4 is designed to facilitate fast and efficient transmission of multimedia data. The MPEG-4 standard, also known as ISO/IEC 14496, supports object-based encoding of audio, text, image, and synthetic or natural video data, and includes algorithms for efficient transmission over non-ideal links. In general, the MPEG-4 multimedia standard applies well-known video compression techniques, which were developed from predecessor standards, namely MPEG-1 and MPEG-2. The standard includes a key feature of error resilience, thus making MPEG-4 suitable for applications that utilize error prone channels such as wireless links and the Internet infrastructure.
As shown in FIG. 1, a transceiver system 100 using MPEG-4 encoded data includes a transmitter 103 and a receiver 106. Transmitter 103 includes an MPEG-4 encoder 104 to encode data from one or more source devices, source devices 101 and 102 are shown, into the MPEG-4 format. The encoded data passes through a network 105 to receiver 106. Receiver 106 includes a MPEG-4 decoder 107 that decodes the received data and passes appropriate data to targeted destination devices, destination devices 108 and 109 are shown.
In accordance with the MPEG-4 standard, an object-based scene is built with individual objects with spatial and temporal relationships. Each of the individual objects can be natural (e.g., recorded video) or artificial (e.g., computer generated objects). The objects may be created in any number of ways, including from a user's video camera or an audio-visual recording device, or may be generated by a computer. Advantages to this approach include the ability to build morphed scenes, for example, with animated characters shown in natural scenes or natural characters in animated scenes. Further, splitting the scenes into individual objects can significantly reduce the number of bits required to transmit a completed audio-visual presentation.
With the current demand for access to complete audio-visual information over various network environments, particular attention is paid to methods of reducing the actual amount of digital data required to represent that information. It is expected that future demand for audio-visual data will match or exceed the current demand for networked textual and graphical data.
FIG. 2 shows the division and manipulation of frame data 200 from a stream of video images consisting of a sequence of video frames. A video frame 200 may be divided into a sequence of Macro Blocks (MBs) 201-1, 201-2, . . . , 201-P, where each MB represents a group of, for example, 16 by 16 pixels. A sequence of MBs forms a Group of Blocks (GoB). In known implementations of MPEG-4 decoders, each GoB consists of a fixed number of MBs. During encoding, each GoB is compressed into a compressed video packet of data. For typical streams of video images, the resulting video packets, each packet representing a GoB, will have a variable number of bits. If compression is highly successful, the video packet will be very short in bit-length. On the other hand, if the data within the GoB is dynamic and complex, the video packet will be very long in bit-length. Thus, two GoBs, each containing the same number of MBs and therefore an identical number of unencoded bits, might result in two respective video packets having very different bit-lengths. In other words, equally sized regions of the video image may be represented by very different length video packets of data. Some video packets may be very short. Other video packets may be very long.
Once compressed into video packets, the GoB data is transmitted through a channel 105 to receiver 106 for eventual decoding. Channel 105, for example, a wireless network or the Internet, may be a noisy or error prone channel. Often, errors or bursts of errors are uniformly distributed; thus each bit has an equal probability of being detected erroneously by receiver 106 because of impairments in channel 105.
By definition longer video packets contain more bits than shorter video packets. Thus, longer video packets, on average, have a higher probability of being received with errors than shorter video packets. At some point, if a video packet contains too many errors, the errors cannot be corrected and the entire packet will be discarded. The receiver will then drop the current video packet and seek the beginning of the next transmitted packet.
The shorter video packets, on the other hand, have a lower probability of being received with errors. Each packet, however, has associated with it fixed overhead. Having video data divided into a large number of short video packets inefficiently uses the available channel bandwidth. The use of bandwidth for a large number of overhead bits (e.g., resync and header bits) of each video packet reduces the available bandwidth for video information and reduces the efficiency of channel 105.
As video messaging, video telephony and video conferencing become more prolific with the expansion of the Internet and wireless based networks, there will be a need for more efficient techniques of encoding video data in a way that reduces the impact of channel induced errors on video packets of data and that optimizes the use of available channel bandwidth.