A motion picture such as broadcast television is made of individual pictures that are rapidly displayed to give the illusion of continuous motion. Each individual picture in the sequence is a picture frame. A digitally encoded picture frame is made of many discrete picture elements, or pixels, that are arranged in a two-dimensional array. Each pixel represents the color (chrominance) and brightness (luminance) at its particular point in the picture. The pixels may be grouped for purposes of subsequent digital processing (such as digital compression). For example, the picture frame may be segmented into a rectangular array of contiguous macroblocks, as defined by the ITU-T H series coding structure. Each macroblock typically represents a 16×16 square of pixels.
Macroblocks may in turn be grouped into picture frame components such as slices or groups of blocks, as defined under the ITU-T H.263 video coding structure. Under H.263, a group of blocks is rectangular and always has the horizontal width of the picture, but the number of rows of group of blocks per frame depends on the number of lines in the picture. For example, one row of a group of blocks is used for pictures having 4 to 400 lines, two rows are used for pictures having 404 to 800 lines, and four rows are used for pictures having 804 to 1152 lines. A slice, on the other hand, is a flexible grouping of macroblocks that is not necessarily rectangular. Headers within the encoded video picture bit stream identify and provide important information about the various subcomponents that make up the encoded video picture. The picture frame itself has a header, which contains information about how the picture frame was processed. Each group of blocks or slice within a video picture frame has a header that defines the picture frame component as being a slice or group of blocks as well as providing information regarding the placement of the component within the picture frame. Each header is interpreted by a decoder when decoding the data making up the picture frame in preparation for displaying it.
In certain applications, displaying multiple picture frames within a single display is desirable. For example, in videoconferencing situations it is useful for each participant to have a video display showing each of the other participants at remote locations. Visual cues are generally an important part of a discussion among a group of participants, and it is beneficial for each participant's display to present the visual cues of all participants simultaneously. Any method of simultaneously displaying all the conference participants is called a continuous presence display. This can be accomplished by using multiple decoders and multiple video displays at each site, or by combining the individual video pictures into a single video picture in a mosaic arrangement of the several individual pictures (called a spatial multiplex).
Multiplexing picture frames into a single composite picture frame requires some form of processing of each picture frame's encoded data. Conventionally, a spatial multiplex video picture frame could be created by completely decoding each picture frame to be multiplexed to a baseband level, multiplexing at the baseband level, and then re-encoding for transmission to the various locations for display. However, decoding and re-encoding a complete picture frame is computationally intensive and generally consumes a significant amount of time.
The H.263 standard provides a continuous presence multipoint and video multiplex mode that allows up to four individual picture frames to be included in a single bitstream, but each picture frame must be individually decoded by individual decoders or by one very fast decoder. No means of simultaneously displaying the pictures is specified in the standard. Additionally, time-consuming processing must be applied to the picture frames after they have been individually decoded to multiplex them together into a composite image for display.
Therefore, there is a need in the art for a method and system that can spatially multiplex multiple picture frames into a single picture frame without requiring each individual picture frame to be fully decoded when being multiplexed and without requiring additional processing after decoding to multiplex the picture frames.