The acceptance of digital video compression standards, for example, the Motion Picture Expert Group (MPEG) standard, combined with the availability of a high-bandwidth communication infrastructure have poised the telecommunications market for an explosion of video based services. Services such as video-on-demand, multi-party interactive video games, and video teleconferencing are actively being developed. These and other future video services will require a cost-effective video composition and display technique.
An efficient multiple window display is desirable for displaying the multiple video sequences produced by these applications to a video user or consumer. The implementation of such a windows environment would permit a user to view simultaneously several video sequences or images from several sources. The realization of a commercial multiple window video display is hampered by technological limitations on available data compression equipment.
In digital television and other digital image transmission applications, image signals must be compressed or coded to reduce the amount of bandwidth required for transmission. Typically, a full screen frame of video may be composed of an array of at least 640.times.480 picture elements, or pixels, each pixel having data for luminance and chrominance. Under one standard, for example, the frames are composed of 720.times.480 pixel arrays. A video sequence is composed of a series of such discrete video frames, similar to the frames in a moving picture film. True entertainment quality video requires a frame rate of at least thirty frames per second. Uncompressed, the bit rate required to transmit thirty frames per second would require far more bandwidth than presently practical.
Image coding techniques serve to compress the video data in order to reduce the number of bits transmitted per frame. There are several standard image coding techniques, each of which take advantage of pixel image data repetition, also called spatial correlation.
Spatial correlation occurs when several adjacent pixels have the same or similar luminance (brightness) and chrominance (color) values. Consider, for example, a frame of video containing the image of a blue sky. The many pixels comprising the blue sky image will likely have identical or near identical image data. Data compression techniques can exploit such repetition by, for example, transmitting the luminance and chrominance data for one pixel and transmitting information on the number of following pixels for which the data is identical, or transmitting only the difference between adjacent pixels. Presently, spatial correlation is exploited by compression techniques using discrete cosine transform and quantization techniques. Where such data compression or coding is employed, each video source or transmission node must be equipped with data encoding equipment and each receiving node must likewise be equipped with decoding equipment. Several video coding protocols are well-known in the art, including JPEG, MPEG1, MPEG2 and Px64 standards.
In a multipoint video application, such as a video teleconference, a plurality of video sequences from a plurality of sources are displayed simultaneously on a video screen at a receiving node. Multiple window video display currently requires the use of multiple decoding devices. Otherwise, the video data arriving from multiple sources would often overload the capacity of a single decoding device. Furthermore, currently available decoding devices are not equipped to handle simultaneously video sequences from disparate sources. Decoding circuitry relies on video sequence context information which emanates from the source of the video. Current decoding devices cannot store, access and switch between several video sequence contexts as would be necessary to contemporaneously decode video from several sources.
The disadvantage of the prior art, therefore, is the requirement of multiple decoding devices. At present, the decoder chips and chip sets, even relatively simple ones, such as those compatible with the JPEG and Px64 technologies, are expensive. As a consequence, the use of multiple decoding devices provides an impractical windowing solution.
A further difficulty encountered in multiple window video is that many sources provide video in only one screen display size. In fact, many sources transmit only full screen images which typically comprise 640.times.480 pixels per frame. To provide truly flexible windowing capabilities, different users should have the option of invoking and viewing differently sized windows of the same video. Windows which comprise a fraction of the entire display require the image data to be filtered and subsampled, resulting in frame signals comprising less pixels. For example, a 1/4 screen window, requires frame data comprising only 320.times.240 pixels. It is therefore advantageous to make video data available at a plurality of window sizes or resolution levels. For example, the video of a participant in a teleconference may be made available at full screen resolution, 1/4 screen, 1/16 screen or 1/64 screen, so that the other participants can choose a desired size window in which to view the transmitting participant.
Under one technique of providing multiple resolution levels, each video transmitter provides a plurality of video sequences, each independently containing the data signal for a particular resolution level of the same video image. One method of generating multiple resolution video sequences would be to employ several encoders, one for each resolution level. The requirement of multiple encoders, however, as in the case of decoders, increases system cost. Encoders comprise extremely costly components in digital video transmission systems.