Image compression reduces the amount of data necessary to represent a digital image by eliminating spatial and/or temporal redundancies in the image information. Compression is necessary in order to efficiently store and transmit video image information, e.g., over the internet. Without compression, most applications in which image information is stored and/or transmitted would be rendered impractical or impossible. A few seconds worth of raw video data could easily fill up and overwhelm an average PC's hard drive.
In the art, single channel video compression can be accomplished by taking an initial frame of a video in uncompressed form, and using that frame as a reference frame for encoding video information pertaining to subsequent frames. Rather than encoding and transmitting the entire video content of each frame, the frames are compressed by initially determining the differences between predetermined frames in a sequence, including an initial frame and a predicted frame, and then transmitting only these differences to a decoder. The decoder then reconstructs at least some of the frames of the video based on these differences. Such motion estimation systems also "skip" a number of frames (intermediate frames) which can be readily estimated because they typically include relatively few motion changes from the previous frame. As a result, the actual video content of only a certain number of frames, e.g., every fourth frame, is analyzed. To accommodate the resultant gaps, the intermediate frames are predicted based on the relationships between the predetermined frames and the differences between them. By utilizing such motion estimation methods, an entire video can be transmitted and reconstructed with high image quality and relatively low transmission bandwidth, which are critical features for transmitting video data. For instance, considering that if one assumes 512.sup.2 number of pixels, 8-bit gray level, and 30 Hz full-motion video rate, a bandwidth of 60 Mbps is required. To compress the video data into the required data rate of 128 kpbs from a full video uncompressed bandwidth of 60 Mbps, a 468:1 image compression rate is required. And, for VGA full motion video, this compression rate requirement is quadrupled.
Another necessary feature of video compression concerns accounting for large motion changes in the sequence of frames. For example, MPEG video compression is capable of accounting for such changes, in a single channel of video, by asynchronously sending an "I frame" (essentially a new reference frame). However, the I-frames are inserted every 15 frames regardless of video content. By introducing I-frames asynchronously into the encoded video bit stream, such systems inefficiently increase signal bandwidth. For example, when an I-frame is inserted into a series of encoded frames not containing significant motion, bandwidth is unnecessarily used because transmission of an entire new frame is unnecessary. To the contrary, when an I-frame is not inserted in the video bitstream when the sequence of frames includes a lot of motion, significant errors and artifacts are created. A significant improvement on the above-described MPEG video compression method, also using motion estimation, is realized by the novel "smart" system shown and described in pending U.S. patent application Ser. No. 08/901,832, which inserts I-frames only when warranted by video content (described below). Pending U.S. patent application Ser. No. 08/901,832 is expressly incorporated herein by reference.
The single channel compression method discussed in general terms immediately above will hereinafter be more specifically described in conjunction with the appended figures. As shown in FIG. 1, a method 10 for forming a compressed video data stream for a single channel of video includes taking an initial frame S.sub.1 from a single source, e.g., a video camera, and typically compressing frame S.sub.1 with standard compression techniques. Next, method 10 skips or ignores a predetermined number of subsequent frames, e.g., the two frames shown in phantom in FIG. 1, and predicts the next frame, P.sub.1.sup.1. Then, the error or difference .increment..sub.1.sup.1 between frame S.sub.1 and the predicted frame (and in this case third subsequent frame) P.sub.1.sup.1 is determined.
Method 10 next computes and encodes "filler" frames, B.sub.1.sup.1 and B.sub.2.sup.1, which are predicted forms of the skipped frames, the second and third frames in FIG. 1. The predicted B frames are derived based on the S.sub.1 frame and the P.sub.1.sup.1 frame (as shown by the two-headed arrows in phantom in FIG. 1) and the differences, .increment..sub.1.sup.1, between them, using known techniques. By compressing the frames in this fashion, the encoder encodes only the differences .increment..sub.1.sup.1, along with the full video content of initial frame S.sub.1 (in compressed form), thus providing sufficient information to reconstruct S.sub.1 and P.sub.1.sup.1. A highly encoded skeletal portion of the intermediate predicted frames B.sub.1.sup.1 and B.sub.2.sup.1 may also be encoded; notably, however, when transmitted, this information does not significantly affect the signal bandwidth.
As mentioned above, an additional encoding step involves accounting for significant changes in motion between successive frames by either asynchronously inserting an I-frame into the video data bitstream or, as shown and described in the above-referenced application Ser. No. 08/901,832, determining, with specially designed "smart" software, whether inserting an I-frame is warranted by motion changes in the sequence of frames. The latter approach typically involves segmenting the frames into search blocks and accumulating the error between corresponding blocks of the initial S-frames and their corresponding P-frames, i.e., the predicted third subsequent frames.
When the accumulated error exceeds a predetermined threshold value, a new I-frame is substituted as the next subsequent frame, and the encoding begins as described above by computing the differences between the new reference initial frame, I.sub.1, and the next predicted P frame, the third subsequent frame thereafter. With the I-frame as the new reference initial frame, the encoder determines the motion differences, .increment..sub.2.sup.1, between frame I.sub.1, and the next predicted P frame, P.sub.2.sup.1, while the intermediate frames B.sub.1.sup.1 ' and B.sub.2.sup.1 ' are computed in accordance with known techniques, as discussed previously. If the error does not reach the threshold, meaning that the image correlation between the subject frames is above the threshold value, the existing P frame signal is retained for the beginning of what is the second successive group of frames to be encoded. In this case, the P.sub.1.sup.1 frame becomes the reference initial frame for the next group of frames. The system continues to encode the frames in such successive groupings to provide a compressed data bitstream representing the entire video.
Referring next to FIG. 2, the encoded signals can be transmitted in a sequence 12 (representing only a portion of the frames of the video) to a decoder. Note that the series of signals of the transmission sequence shown in FIG. 2 does not reflect any occurrence of high motion in the sequence of frames, i.e., no I-frames have been encoded.
Initially, in Step 14, the encoder transmits the first group of encoded signals, Group I, to the decoder. These signals include the encoded initial reference frame S.sub.1 and the .increment..sub.1.sup.1 signal, and typically the B-frames, transmitted in that order. Next, the encoder in Step 16 transmits the second group of signals (Group II) which, when an I-frame is not encoded in the sequence, includes only the .increment..sub.2.sup.1 signal followed by, when appropriate, the intermediate B.sub.1.sup.1 ' and B.sub.2.sup.1 ' frames. Unlike the Group I signals which included a compressed form of reference initial frame S.sub.1, the reference initial frame corresponding to Group II, which is P.sub.1.sup.1, does not need to be sent because the decoder already decoded and stored the P.sub.1.sup.1 frame in memory when it decoded the Group I signals. In sum, when an I-frame is not encoded, the new reference frame for the next group of signals will not have to be sent because the new reference frame will be the decoded and stored P frame from the previous Group.
The encoder then proceeds to send the Group III signals which include the .increment..sub.3.sup.1 signal followed by signals indicative of the compressed skeletal forms of the B.sub.1.sup.1 " frame and the B.sub.2 .sup.1 " frame in Step 18. Similarly, method 10 then transmits subsequent groups of signals until all encoded frames of the video are transmitted. Note that sequence 12 is presented for illustrative purposes only and that, in reality, video usually exhibits at least some significant motion changes which will be reflected by the insertion of encoded I-frames in the compressed video data stream.
The receiver/decoder then executes a method 24, as depicted in FIG. 3, to re-create the sequence of frames of the single channel of video. (Note that the symbolic representation {x, y}.fwdarw.w indicates that signals "x" and "y" are combined in a preprogrammed manner by specially designed algorithms to produce the signal "w".) After receiving the encoded signals of Group I in Step 26, method 24, in Step 28, decompresses reference initial frame S.sub.1. Thereafter, method 24 performs Step 30 by decompressing .increment..sub.1.sup.1, the signal which represents the motion difference between the reference initial frame, S.sub.1, and the predicted P frame, P.sub.1.sup.1, for Group I. The decompression performed by the method in Steps 28 and 30 is standard decompression of the corresponding signals that were compressed using standard compression techniques in the encoding process; for example, as mentioned previously MPEG, compression/decompression can be used.
Then, in Step 32, method 24 combines Group I signals S.sub.1 and .increment..sub.1.sup.1 to re-create frame P.sub.1.sup.1, which is thereafter used in combination with S.sub.1 and .increment..sub.1.sup.1 in Step 34 to re-create predicted frames B.sub.1.sup.1 and B.sub.2.sup.1. With the S.sub.1 frame restored in this manner, and the P.sub.1.sup.1 frame re-created, the decoder can transmit the signals indicative of these frames to a display unit (not shown), as depicted in Step 36, in the sequence indicated, i.e., S.sub.1, B.sub.1.sup.1, B.sub.2.sup.1 and finally P.sub.1.sup.1.
Next, method 24 executes Step 38 to determine whether an I-frame was encoded into the video bitstream of the second group of frames (Group II). Again, in standard video compression an I-frame is inserted into the video bitstream every 15 frames regardless of the video content, while the method in the above-referenced pending application, incorporated herein, only inserts an I-frame when high motion content is present. The latter method is preferred.
In the event that an I-frame is not encoded (as in the examples described in connection with FIGS. 1 and 2), method 24 executes Step 40 by decompressing .increment..sub.2.sup.1 using, for example, MPEG decompression algorithms. Using .increment..sub.2.sup.1, method 24 re-creates the next P frame, P.sub.2.sup.1, in Step 42 by combining .increment..sub.2.sup.1 with the reference initial frame for Group II, P.sub.1.sup.1 (stored in memory when the Group I signals were decoded in Step 32).
After creating the P frame P.sub.2.sup.1, method 24 combines the new reference initial frame, the P frame of Group I, P.sub.1.sup.1, along with the just created P frame (the reference predicted frame), P.sub.2.sup.1, and the motion difference signal .increment..sub.2.sup.1 to create B frames B.sub.1.sup.1 ' and B.sub.2.sup.1 ' in Step 44. Once the aforementioned signals have been created, the decoder transmits the decoded signals to the display unit in the sequence shown in Step 46, i.e., B.sub.1.sup.1 ', followed by B.sub.2.sup.1 ' and finally followed by P.sub.2.sup.1 (representing, in this example, the seventh frame of the video). This re-creation, transmission and display of the groups of signals is re-executed until the entire video is displayed (Step 48).
In the event that an I frame is encoded into the video bitstream as the next subsequent frame, the decoder substitutes I.sub.1 for S.sub.1 and executes the Steps of method 24 from the beginning. In particular, the method 24 decompresses I.sub.1 in Step 50, using standard decompression algorithms and then, in Step 52, decompresses the encoded signal relating to the differences between I.sub.1 and the P frame, P.sub.1(new).sup.1, associated with I.sub.1, .increment..sub.1(new).sup.1. Then, P.sub.1(new).sup.1 is reconstructed by combining I.sub.1, with .increment..sub.1(new) ' (Step 54). Then, the highly encoded B frames, B.sub.1(new).sup.1 and B.sub.2(new).sup.1, are reconstructed in accordance with algorithms that combine I.sub.1, .increment..sub.1(new).sup.1 and P.sub.1(new).sup.1 in a pre-programmed fashion (Step 56). In Step 58, the decoder transmits the decoded signals I.sub.1, B.sub.1(new).sup.1, B.sub.2(new).sup.1 and P.sub.1(new).sup.1 to the display unit for display in that order.
Although the above-described system provides advantages relating to efficient use of bandwidth without sacrificing video quality in a single channel system, the art of video signal compression is in need of a system that can provide data compression of multiple sources of video. A system is desired which not only can determine differences among frames within a channel, but can cross correlate frames from multiple sources. Such a system would be particularly beneficial with regard to applications in which there is a high degree of similarity between the information obtained by each source, wherein entire video content does not need to be transmitted to re-create the video from each source.
There are a variety of such multiple-channel applications. For instance, stereoscopic data consists of multiple channels of input, typically two sources looking at an object from two points of view, for forming a 3-D image. Clearly, there is a significant amount of redundant information between the two sources. Another quite common application is capturing video in a "look-around" environment where multiple cameras are utilized to look at a range of scenery or a designated object, with each camera accounting for one channel of data representing a particular view of the designated object or scene, e.g, from a variety of angles. In either of these situations, it would be desirable to coordinate the multiple sources such that the redundant information between the sources would not have to be encoded and transmitted to re-create the entire video of each source, thus tending to maximize the throughput of data and conserve signal bandwidth.
In yet another application, a single camera may be used to look at a spectral image whereby the signal obtained is divided into separate channels based upon narrow bandwidth windows using filters. When looking at such images, hundreds of channels can be realized within a few nanometers. Notably, the image data in each such channel contains a tremendous amount of correlated data vis-a-vis adjacent channels, each channel corresponding to a slightly different bandwidth. It is very inefficient to transmit full video content of each of these channels.
In still another application, data captured by a single source at different times may have a significant amount of correlated data, as may be the case when using a video phone from a particular environment to send information over the internet. For example, if the user transmits a video phone message over the internet on a subsequent day from the same place as on a previous day, much of the surrounding information will stay the same, and only certain aspects of the transmission will change, e.g., the face expressions of the user. Due to the amount of similar data from each of the transmissions, it is inefficient to encode and transmit all the information contained in each message.
In each of these applications, all of the captured information does not need to be encoded because there is a relatively high degree of similar information gathered by each of the sources. Therefore, a system is desired that, in conjunction with standard compression techniques, takes advantage of this redundant information.