1. Field of the Invention
The present invention relates to a method and apparatus for video encoding, predecoding, and reconstructing the original video sequence for video streaming services, a bitstream structure, and an image filtering method.
2. Description of the Related Art
With the development of information communication technology including the Internet, a variety of communication services have been newly proposed. One among such communication services is a Video On Demand (VOD) service. Video on demand refers to a service in which a video content such as movies or news is provided to an end user over a telephone line, cable or Internet upon the user's request. Users are allowed to view a movie without having to leave their residence. Also, users are allowed to access various types of knowledge via moving image lectures without having to go to school or private educational institutes.
Various requirements must be satisfied to implement such a VOD service, including wideband communications and motion picture compression to transmit and receive a large amount of data. Specifically, moving image compression enables VOD by effectively reducing bandwidths required for data transmission. For example, a 24-bit true color image having a resolution of 640×480 needs a capacity of 640×480×24 bits, i.e., data of about 7.37 Mbits, per frame. When this image is transmitted at a speed of 30 frames per second, a bandwidth of 221 Mbits/sec is required to provide a VOD service. When a 90-minute movie based on such an image is stored, a storage space of about 1200 Gbits is required. Accordingly, since uncompressed moving images require a tremendous bandwidth and a large capacity of storage media for transmission, a compression coding method is a requisite for providing the VOD service under current network environments.
A basic principle of data compression is removing data redundancy. Motion picture compression can be effectively performed when the same color or object is repeated in an image, or when there is little change between adjacent frames in a moving image.
Known video coding algorithms for motion picture compression include Moving Picture Experts Group (MPEG)-1, MPEG-2, H.263, and H.264 (or AVC). In such video coding methods, temporal redundancy is removed by motion compensation based on motion estimation and compensation, and spatial redundancy is removed by Discrete Cosine Transformation (DCT). These methods have high compression rates, but they do not have satisfactory scalability since they use a recursive approach in a main algorithm. In recent years, research into data coding methods having scalability, such as wavelet video coding and Motion Compensated Temporal Filtering (MCTF), has been actively carried out. Scalability indicates the ability to partially decode a single compressed bitstream at different quality levels, resolutions, or frame rates.
FIG. 1 illustrates the configuration of a video streaming service provider 100 using a video coding scheme supporting low scalability. For convenience of explanation, a video streaming service for a single video sequence will be described.
Referring to FIG. 1, the video streaming service provider 100 receives a video sequence and performs video coding on the video sequence using a coding algorithm such as MPEG-1, MPEG-2, H.263, or H.264. A bitstream obtained by coding the video sequence with these coding algorithms is not scalable or supports little scalability. Thus, to provide video streaming services at various spatial resolutions and frame rates, a bitstream needs to be generated for each resolution and frame rate. To accomplish this, the video streaming service provider 100 includes a plurality of converters 110-2 through 110-n, each converting a video sequence into another video sequence with a lower spatial resolution and (or) a lower frame rate, a plurality of encoders 120-1 through 120-n encoding the video sequence or the video sequences subjected to conversion with a video coding algorithm into bitstreams, and a selector 130 selecting one of the bitstreams with different spatial resolutions and frame rates for transmission to a video decoder 140.
More specifically, a second converter 110-2 converts the received video sequence into a video sequence with a lower spatial resolution or (and) a lower frame rate by performing downsampling or frame rate reduction. MPEG-based downsampling results in smooth images. The resulting video sequence is then sent to a second video encoder 120-2. Similarly, a third converter 110-3 converts the video sequence and sends the resulting sequence to a third video encoder 120-3, and an n-th converter 110-n transmits the video sequence to an n-th video encoder 120-n after conversion.
A first video encoder 120-1 performs video coding on the video sequence at the highest spatial resolution and highest frame rate. For example, the first video encoder 120-1 may receive the video sequence with 704×576 resolution and 60 Hz frame rate and encode the video sequence into a bitstream with 704×576 resolution and 60 Hz frame rate. The bitstream obtained by coding while maintaining the same resolution and frame rate as the original video sequence can be provided to a user when a sufficient network bandwidth is available to support it. For example, if 6 Mbps network bandwidth is stably available, the bitstream generated by the first video encoder 120-1 can be provided to the user. The bitstream provided to the user is decoded by the video decoder 140 to reconstruct the original video sequence with 704×576 resolution and 60 Hz frame rate.
The second video encoder 120-2 encodes a video sequence with a lower spatial resolution and (or) a lower frame rate than that encoded by the first video encoder 120-1 into a bitstream. Similarly, the third video encoder 120-3 performs video encoding at different spatial resolution and (or) frame rate than the first and second video encoders 120-1 and 120-2 and generates a bitstream. In this way, the first through the n-th video encoders 120-1 through 120-n generate bitstreams with different spatial resolutions and (or) frame rates from the same video sequence.
The selector 130 provides a bitstream having a spatial resolution and a frame rate requested by the user (video decoder 140) to the video decoder 140. When a sufficient network bandwidth is available, the user can make a request for a video with a high spatial resolution and a high frame rate, and the video streaming service provider 100 delivers a bitstream with the high spatial resolution and the high frame rate selected by the user to the user. If the network bandwidth is not stable, a video sequence reconstructed by the video decoder 130 from a bitstream coded at high resolution and high frame rate can be easily disrupted during playback. In this case, the user can request a bitstream coded at lower resolution and (or) lower frame rate from the video streaming service provider 100.
The video decoder 140 receives a bitstream corresponding to each video sequence from the video streaming service provider 100 for decoding. For example, in order to reconstruct a video sequence, an MPEG-2 coded bitstream can be decoded using an MPEG-2 decoding algorithm while an H.264 coded bitstream can be decoded using H.264 decoding scheme.
A video streaming service provider using non-scalable or low scalability video coding algorithm like in FIG. 1 must perform a plurality of video coding processes on the same video sequence with various spatial resolutions and frame rates according to network environment or user's request. As a result, a plurality of bitstreams are generated for the same video sequence. Generating a bitstream at each resolution and frame rate requires a great deal of computational capacity. Furthermore, services delivering video streams to users at various spatial resolutions and frame rates, which are commonly known as simulcasting services, require high capacity storage media for storing generated bitstreams.
FIG. 2 schematically illustrates the configuration of a video streaming service provider 200 using a wavelet-based scalable video coding scheme. For convenience of explanation, video coding for a single video sequence will be described.
Referring to FIG. 2, the video streaming service provider 200 includes a scalable video encoder 210 encoding a video sequence and a predecoder 220. The scalable video encoder 210 uses a video coding algorithm having scalability to generate a scalable bitstream. In currently known scalable video coding algorithms, spatial scalability can be attained by wavelet transform, temporal scalability can be attained by Motion Compensated Temporal Filtering (MCTF), unconstrained MCTF (UMCTF) or Successive Temporal Approximation and Referencing (STAR), and Signal to Noise Ratio (SNR) scalability can be attained by embedded quantization.
The bitstream obtained by encoding the video sequence through the scalable video encoder 210 is predecoded by the predecoder 220. Predecoding is a process of truncating some bits of a scalable bitstream. The bitstream may be predecoded into a bitstream with a lower spatial resolution, a lower frame rate, or a lower image quality than an original bitstream. When the video decoder 230 at the user side requests a video sequence with specific resolution and frame rate from the video streaming service provider 200, the predecoder 220 in the video streaming service provider 200 truncates some bits of the bitstream and transmits the resulting bitstream to the video decoder 230. The video decoder 230 decodes the bitstream and reconstructs a video sequence with the requested resolution and frame rate.
Using a scalable video coding algorithm for a video streaming service in this way allows simulcasting of a single bitstream obtained from a single video sequence at various resolutions and frame rates. However, currently known scalable video coding algorithms do not offer high quality bitstreams at all resolutions. For example, the highest resolution video can be reconstructed with high quality, but a low resolution video cannot be reconstructed with satisfactory quality. More bits can be allocated for video coding of the low resolution video to improve its quality. However, this will degrade the coding efficiency.
As described above, the video streaming service shown in FIG. 1 can provide a bitstream optimized at every resolution, but may waste computational capacity and storage space. On the other hand, the video streaming service shown in FIG. 2 is able to provide bitstreams having various resolutions and frame rates using a single bitstream, but may offer poor image quality at some resolutions or degrade coding efficiency to improve image quality. Therefore, there is an urgent need for a video coding scheme for video streaming service delivering satisfactory image quality and high video coding efficiency by achieving a good trade-off between the coding efficiency and reconstructed image quality.