In video streams sometimes monochrome frames are used to signal borders between different parts of the video stream, e.g. between shots taken by e.g. a camera or camcorder, or between commercial spots or the like.
Monochrome frames are video frames containing a homogeneous and uniform single color distribution across the frame. Monochrome frames use to appear in the boundaries between two commercials but they also appear during fades, right before the credits of a film or even during strong flashes. Throughout this specification frames are also referred to as pictures or images.
The presence of a logo in a frame would make it not monochrome. However, since this logo was added to the frame afterwards, the rest of the frame might be monochrome. As a result, two kinds of monochrome frames have to be considered: the ones with the presence of a logo, and ones without it.
Nowadays, most of video data is transmitted in a compressed form, e.g. in MPEG-2 (Motion Pictures Expert Group) form. In MPEG-2, Motion Compensation is performed in the spatial domain, that is, after the decoding of the required reference frames. To be able to Motion Compensate any frame, first, the reference frames on which the frame to reconstruct is based have to be decoded and buffered.
Then, using the motion vectors for the current frame, the required pixel information is taken from the corresponding decoded reference frames and placed in the current frame. Additionally, for predicted frames with differential error coding, the transmitted error is decoded and added to the motion estimation.
In the compressed domain, however, this motion compensation process cannot be applied for one fundamental reason: while in the spatial domain all the pixels corresponding to the reference frames are available (since they have been previously decoded), in the compressed domain only the DCT coefficients of each previously delimited macro-block may be used. In most cases, the reference region pointed to by the motion vectors does not match with a unique macro-block, but overlaps several macro-blocks.
There are several MPEG standards for digital video: MPEG-1, MPEG-2, MPEG-4. MPEG-2 is intended for high data rate video application ranging from video conferencing to High Definition TV.
Like any compression algorithm, MPEG-2 tries to reduce the redundancy in the video data.
In general, uncompressed video data consists of a sequence of consecutive frames taken at different instants in time. In MPEG-2, each frame is hierarchically divided in slices, macro-blocks (MBs), blocks and pixels (pels). The pels (or pixels) are the smallest image elements, and they represent individual sample values of luminance and chrominance (equivalent to red, green and blue color intensities in RGB standards). A block is a set of 8×8 pels, a macro-block consists of 4 blocks or 16×16 pels, and a slice is a horizontal array of 1×n macro-blocks, n being the number of macro-blocks from 1 to the maximum number of macro-block horizontally.
Like JPEG image compression algorithm, MPEG-2 employs a block-based two-dimensional Discrete Cosine Transform (DCT). A block of 8×8 pels is transformed into a 8×8 block of DCT coefficients.
In pel blocks with uniform luminance and color, like a piece of the sky, a few DCT coefficients will concentrate all the energy, while the rest will be zero or almost zero. Thus, very frequently, for each 64 frame block only a few DCT coefficients have to be transmitted, reducing the amount of information tremendously. Thus, for a monochrome block, only the top leftmost coefficient (also called DC coefficient) would be non zero, while for a high textured or noisy block, the bottom rightmost part would contain some non-zero values. After quantization, the resulting non-zero coefficients are scanned in a zigzag way starting from the upper rightmost coefficient, and are encoded using a Variable Length Coding (VLC).
Temporal redundancy exists due to the similarity between adjacent frames. In MPEG-2 there are 3 main types of frames: I-frames, P-frames and B-frames. In I-frames all macro-blocks are intra-coded, that means, the quantized DCT coefficients of all macro-blocks are transmitted. In P-frames, macro-blocks can be either intra-coded, forward predicted, or skipped, depending on the degree of change of the macro-block with respect to the previous frame. Similarly, B-frames macro-blocks can be intra-coded, skipped, forward predicted, backward predicted or bi-directionally predicted.
Each forward predicted macro-block is derived from the previous reference frame's (I or P-frame) macro-block pointed to by a motion vector (MV), and an estimated error. That is, instead of transmitting the DCT coefficient of the macro-block, a motion vector pointing to the previous position of the macro-block is provided together with the estimated error of this prediction. This way the DCT coefficient information of previous reference frames is used to derive the current macro-block information. In the same fashion, backward predicted macro-blocks consist of a motion vector pointing to the position of the macro-block in the next reference frame.
Bi-directionally predicted macro-blocks contain two motion vectors, one from the previous reference frame, and one of the next reference frame.
The motion vectors are calculated during the compression process by comparing each macro-block with some or all other macro-blocks in the previous and/or next reference frame. There are several ways how this motion vectors can be obtained.
The most popular is the Inter-frame Hybrid Coding. With this method, the motion vectors are obtained in the Motion Estimator in the spatial domain, that is, with the uncompressed video information. Then, the motion vectors will be differentially encoded: each transmitted motion vector represents the difference with respect to the previously transmitted motion vector. Finally, the Motion Compensated Predictor obtains the difference between the reconstruction based on motion vector and the original frame. For this purpose the encoded DCT coefficients have to be inverse quantized and inverse transformed. The differential error is VLC coded and sent together with the motion vectors and a flag indicating whether there is such error information or not. MPEG-2 can deal with both Progressive and Interlaced video.
Pictures or frames are organized in Groups of Pictures (GOP). A GOP is a combination of one I frame and zero or more P and B-frames which is usually (but not necessarily) periodically repeated during the whole video sequence. A GOP contains at least and just one I-frame, which is located at the beginning of the GOP.
In US 2007-0256091 A1 a method to extract monochrome frames in the spatial domain (i.e. after decompression) by comparing the pixels average value is disclosed. However, the video stream has to be fully decompressed, incurring in high computational cost, especially when applied to high quality video streams.
In US 2007-0206931 a method for extracting monochrome frames in the compressed domain by using statistical number or intra-coded macro-blocks is disclosed. The average number of intra-coded macro-blocks is used as an indicator of the presence of a monochrome frame. However, this method can only be applied to P- and B-frames, since I-frames are always intra-coded.
Thus, there is a need for an improved method and device for approximating a current of first blocks of pixels of a compressed frame and an improved method and device for detecting a monochrome frame or frame in a compressed video stream and a respective computer program product.
Further details of the invention will become apparent from a consideration of the drawings and ensuing description.