The acceptance of digital video compression standards, for example, the Motion Picture Expert Group (MPEG) standard, combined with the availability of a high-bandwidth communication infrastructure have poised the telecommunications market for an explosion of video based services. Services such as video-on-demand, multi-party interactive video games, and video teleconferencing are actively being developed. These and other future video services will require a cost-effective video composition and display technique.
An efficient multiple window display is desirable for displaying the multiple video sequences produced by these applications to a video user or consumer. The implementation of such a windows environment would permit a user to simultaneously view several video sequences or images from several sources. The realization of a commercial multiple window video display is hampered by technological limitations on available data compression equipment.
In digital television and other digital image transmission and storage applications, image signals must be compressed or coded to reduce the amount of bandwidth required for transmission or storage. Typically, a full screen frame of video may be composed of an array of at least 640.times.480 picture elements, or pixels, each pixel having data for luminance and chrominance. A video sequence is composed of a series of such discrete video frames, similar to the frames in a moving picture film. True entertainment quality video requires a frame rate of at least thirty frames per second. Uncompressed, the bit rate required to transmit thirty frames per second would require far more bandwidth than is presently practical.
Image coding techniques serve to compress the video data in order to reduce the number of bits transmitted per frame. There are several standard image coding techniques, each of which takes advantage of pixel image data repetition, also called spatial correlation.
Spatial correlation occurs when several adjacent pixels have the same or similar luminance (brightness) and chrominance (color) values. Consider, for example, a frame of video containing the image of a blue sky. The many pixels comprising the blue sky image will likely have identical or near identical image data. Data compression techniques can exploit such repetition by, for example, transmitting, or storing, the luminance and chrominance for data for one pixel and transmitting, or storing, information on the number of following pixels for which the data is identical, or transmitting, or storing, only the difference between adjacent pixels. Presently, spatial correlation is exploited by compression techniques using discrete cosine transform and quantization techniques. Where such data compression or coding is employed, each video source must be equipped with data compression equipment and each video receiver must likewise be equipped with decoding equipment. Several video coding protocols are well-known in the art, including JPEG, MPEG1, MPEG2 and P.times.64 standards.
In a multipoint video application, such as a video teleconference, a plurality of video sequences from a plurality of sources are displayed simultaneously on a video screen at a receiving terminal. In order to display multiple windows, the prior art generally required multiple decoding devices to decode the multiple video signals from the multiple sources. At present, multiple decoder devices are expensive, and therefore an impractical solution for creating multiple video windows.
A further difficulty encountered in multiple window video is that many sources provide video in only one screen display size. In fact, many sources transmit only full screen images which typically comprise 640.times.480 pixels per frame. To provide truly flexible windowing capabilities, different users should have the option of invoking and viewing differently sized windows of the same video. Windows which comprise a fraction of the entire display require the image data to be filtered and subsampled, resulting in frame signals comprising less pixels. It is therefore advantageous to make video data available at a plurality of window sizes or resolution levels. For example, the video of a participant in a teleconference may be made available at full screen resolution, 1/4 screen, 1/16 screen or 1/64 screen, so that the other participants can choose a desired size window in which to view the transmitting participant. Other examples in which it would be advantageous to generate multiple resolution video signals would be picture-in-picture for digital TV in which a user would receive signals from plural sources at only the resolutions necessary to fill a selected image size. Similarly, a video server might output multiple resolution streams to enable a user to display images from multiple sources in different windows. Each window requires less than full resolution quality. Thus, by transmitting to the user only that bitstream associated with the size of the image requested to be displayed rather than a full resolution bitstream, substantial bandwidth can be saved as can the processing power necessary to decode the full-resolution bitstream and to scale the resulting video to the desired less than full resolution image size.
Under one technique of providing multiple resolution levels, each video transmitter provides a plurality of video sequences, each independently containing the data signal for a particular resolution level of the same video image. One method of generating multiple resolution video sequences would be to employ several encoders, one for each resolution level. The requirement of multiple encoders, however, as in the case of decoders, increases system cost since encoders comprise costly components in digital video transmission systems.
The inventors of the present invention are co-inventors, together with G. L Cash and D. B. Swicker, of co-pending patent application Ser. No. 08/201,871, filed Feb. 25, 1994, now U.S. Pat. No. 5,481,297 issued on Jan. 2, 1996. In that patent, a multipoint digital video communication system is described which employs a single standard encoder, such as JPEG or MPEG, to encode multiple resolution video signals derived from a full resolution video signal, and a single standard decoder, such as JPEG or MPEG, to decode and display multiple resolution video signals. In that system, macroblocks of a sampled full resolution video signal and macroblocks of a subsampled input video signal at multiple different fractional resolutions are multiplexed into a single stream before being fed to the single standard video encoder, which encodes or compresses each macroblock individually. Because MPEG-based standard compression systems employ interframe coding in which the encoder relies on information from previous (and in some cases future) frames, a reference frame store must provide separate reference frame information to the encoder for each resolution. Thus, control logic is necessary to change the reference frame buffer as well as the resolution related information in accordance with each macroblock's resolution as it is processed by the encoder. Similarly, at the decoder, before decoding macroblocks from different resolution sources the decoder needs to be context switched and information from a previous (and in some cases a future) frame must be provided in the resolution associated with the macroblock. The standard encoder and decoder must, therefore, operate cooperatively with complex circuitry to provide the necessary context switching functionality. Furthermore, since context switching need be performed on a macroblock-by-macroblock basis, substantial computational overhead is required to enable individual marcroblocks to be processed separately.
An object of the present invention is to simultaneously create plural independent multiple resolution video data streams from a single video source using a single standard coder without the complexity of context switching.